##### 21 Oct 2019

# Introduction to Python: Review and Discussion

Some odds 'n ends from the first two Python lectures, plus some in-class coding exercises

* variable names
* side effects
* Python 2 _vs_ Python 3
* `format` method
* CpG islands (in-class project)

## Variable Names: Python Conventions

The rules (from lecture notes named 'Python I'):
* names start with a letter or underscore
* can have any number of characters
* characters after the first can be letters, digits, underscores

These are all legal variable names in Python:

In [15]:
a = 10

In [16]:
A = 3           # a different variable -- case is signicant

In [17]:
print(a, A)

10 3


In [18]:
first_name = 'Hermione'

In [19]:
LastName = 'Granger'

In [20]:
Area51Warning = 'Here there be 👽'

In [21]:
print(first_name, 'said', Area51Warning)

Hermione said Here there be 👽


### PEP 8 

Over the years the Python community has adopted a set of standards for names

The document that describes these standards is known as "PEP 8"

* see https://www.python.org/dev/peps/pep-0008

The part that is relevant for us:

* "Function names should be lowercase, with words separated by underscores as necessary to improve readability."

* "Variable names follow the same convention as function names."

* "mixedCase is allowed only in contexts where that's already the prevailing style ... to retain backwards compatibility."

### 🛠  &nbsp; Bi 410/510 Standards 

Variable and function names should start with lower case letters
* yes:  &nbsp; `celsius`, &nbsp;` plural`, &nbsp;` t`
* no: &nbsp;&nbsp;&nbsp; `Celsius`, &nbsp; ` T`

Use underscores to separate words in long names (_aka_ "snake case")
* yes:  &nbsp; `last_name`, `first_name`, `celsius_to_fahrenheit`
* no: &nbsp;&nbsp;&nbsp; `LastName`, `CeliusToFahrenheit` (a format called "camel case")

Names should start with lower case letters
* yes: &nbsp; `area_51_warning`
* no: &nbsp;&nbsp;&nbsp; `Area_51_warning` &nbsp; or &nbsp; `Area51Warning`

## Descriptive Variable Names (in Any Programming Language)

Here are two versions of the function that converts temperatures from Fahrenheit to Celsius:

In [1]:
def cels(t):
    f = (9/5) * (t-32)
    return f

In [2]:
def foo(n):
    x = (9/5) * (n-32)
    return x

They both do the same thing; the only difference is the names of the function and the variables.

### 🛠  &nbsp; Use Descriptive Function Names and Variable Names 

The first version (`cels`) is clearly better

Names need to be **descriptive**
* other people who read the program should understand what the name is use for
* what does a function compute?
* what values are stored in a variable?
  
"Other people" includes "future you" -- you will be amazed how much you can forget about your own programs a few months after you write them!

#### Single Letters? 

The name `f` in this function stands for "Fahrenheit value".

It's OK for such a small function.

A better choice might be `fahr` or even `fahrenheit`
* names that are too long are tedious to write (and to read!)

💎 &nbsp; A rule of thumb: names used in many places in the notebook (like function names) should be longer and more descriptive.

#### In Defense of `foo` 

There are times when a nonsense name like `foo` is the right choice
* the code might illustrate how to write a function, showing the structure of a function definition
* readers might be mislead, _e.g._ this is how to write a temperature converter, but maybe not some other kind of function

The same goes for the arithmetic expression -- if the goal is to show how precedence rules work the variable names are not as important

## Returning a Value _vs_ Printing a Value

Here are two versions of the temperature converter:

In [24]:
def cels(t):                    # the version shown earlier
    f = (9/5) * (t-32)
    return f

In [25]:
def pcels(t):                   # a version that prints the result
    f = (9/5) * (t-32)          # (the 'p' in the name means 'print')
    print(f)

The notebook looks very similar when we run each version:

In [3]:
cels(70)

68.4

In [27]:
pcels(70)

68.4


### `cels` Returns a Value 

Notice how the code cell that calls `cels` has an Out section
* the In prompt shows the statement (expression) that is evaluated
* the Out section below it is the result
* in this example, the result is **the value returned by the call to `cels`**

To emphasize this, let's save the value returned by `cels`, and then look at that value:

In [28]:
cv = cels(70)

In [29]:
print(cv)

68.4


### `pcels` Prints a Value 

Look carefully at the code cell where `pcels` was called
* there is no Out section
* that's because the function **didn't return a result**
* the "68.4" is a string printed by the function itself, not a value returned by the call

Notice what happens when we save the result of a call to `pcels`:

In [30]:
pcv = pcels(70)

68.4


In [31]:
print(pcv)

None


### `None` 

`None` is a special value in Python (similar to `True` and `False`, the Boolean constants)
* it means "no value" or "no object"

If a function (like `pcels`) does not have a `return` statement it will return `None`

## Side Effects 

In computer science jargon, anything printed by a function is a **side effect** of the function all
* the term implies that the main reason to call a function is to for the value computed by the function
* anything else is extraneous

### 🛠  &nbsp; Your Functions Should Return Values

Unless the project explicitly says a function should print something, your functions should return values, not print them


Almost all project specs will say "write a function that returns ..."

## When to Use `print` 

There are times when it is appropriate to use a `print` statement

### The Function Is Designed Specifically to Produce Output

Some projects will ask you to write a function that generates some output, _e.g._
* "implement a function that prints a table that shows $f(x)$ for $0 \leq x \leq 10$"
* "the function `foo` should write a file containing ..."

But even in these cases there will be a clear "separation of concerns"
* your notebook will have some "pure" functions that compute values
* other functions will format and print the results

💎 &nbsp; Rarely, if ever, will a function be used both to create results and print them.

### Temporarily Add `print` Statements to Help Debug a Function

One very effective technique for debugging a program is to add `print` statements

Example:
* print the values used in a Boolean expression immediately before an `if` statement
* print messages in the `if` clause and `else` clause to let you know which one is executed

Example: here is a function that should return a string, but it doesn't seem to work when we pass it the number 10:

In [32]:
def high_low(x):
    "Return 'low' if x is less than 10, otherwise return 'high'"
    if x < 10:
        return 'low'
    elif x > 10:
        return 'high'

In [33]:
high_low(25)

'high'

In [34]:
high_low(5)

'low'

In [35]:
high_low(10)       ### an error -- the result should be "high" but we're getting None

Here is the same function, but I've added `print` statements to help debug it.

In [3]:
def high_low(x):
    "Return 'low' if x is less than 10, otherwise return 'high'"
    print('x =', x)
    if x < 10:
        print('x < 10')
        return 'low'
    elif x > 10:
        print('x > 10')
        return 'high'
    print("I shouldn't be here")

In [37]:
high_low(25)

x = 25
x > 10


'high'

In [38]:
high_low(5)

x = 5
x < 10


'low'

In [2]:
high_low(10)       # the first print is executed, but not the 2nd or the 3rd... why?

x = 10
I shouldn't be here


#### How Should We Fix This Bug? 

Let's edit the code in this cell (so when the notes are published on Canvas we'll see both the "before" and "after" versions):

New spec: return "high" if $x \geq 10$

In [None]:
def high_low(x):
    if x=10

In [None]:
high_low(25)

In [None]:
high_low(5)

In [None]:
high_low(10)

### Scaffolding 

When your program is working you can delete the print statements.

Statements that are added during construction and then torn down when the job is done are referred to as **scaffolding**

#### 💎 &nbsp; Comment / Uncomment 

Instead of deleting the print statements consider "commenting them out" instead.
* add a # to the beginning of the line (the very first character, not indented)

That way you can "uncomment" the print statements if you ever need them again in the future.

## Python 2 _vs_ Python 3 

Our textbook (_Practical Computing for Biologists_) has examples written in Python 2, but we're using Python 3

Almost everything we've seen so far (expressions, `def` statements, `if` statements) is the same in both versions

One difference:
* in Python 2, `print` is a **keyword**
* in Python 3, `print` is a **builtin function**

A **print statement** in Python 2 starts with the word `print` and is followed by a space and a list of values to print
```
print 6*7
print first_name, 'said', area_51_warning
```

In Python 3, we call a function named `print` and pass it a list of values to print:
```
print(6*7)
print(first_name, 'said', area_51_warning)
```

Old habits:  we still say a call to `print` is a "print statement"

### Coding Standards 

The PEP-8 document says that even though spaces between function names and opening parens are legal they should be avoided
```
print (6*7)          ## <--- bad style
print(6*7)           ## OK
```

### 🛠  &nbsp;  Do Not Put Spaces After Function Names

```
print ('hello')        ## No
cels (70)              ## No
```

```
print('hello')         ## Yes
cels(70)               ## Yes
```

### 🛠  &nbsp;  No Spaces In Function Definitions, Either

```
def cels (t):          ## No
    ...
```

```
def cels(t):           ## Yes
    ...
```

## Formatting Output 

You may have noticed that Python automatically adds spaces between values when they are printed:

In [40]:
a = 7
b = 6

print(a,b)

7 6


That can be a problem when we want to print nicely formatted output
* the output below would look a lot better if there wasn't a space after the dollar sign

In [41]:
payment = 432.50

print('You will pay $', payment, 'per month')

You will pay $ 432.5 per month


### Output Templates 

Python allows us to define a "template" that consists of text and placeholders
* we'll call a method named `format`, passing it values to insert into the template

Placeholders are pairs of braces ("curly brackets")
* notice how in this template the dollar sign is right next to the placeholder so there won't be any extra space

In [42]:
t = 'You will pay ${} per month'

### `format` 

The template is a string, and we can use it by calling a string method named `format`

In [43]:
t.format(432.50)

'You will pay $432.5 per month'

In [44]:
t.format(11.99)

'You will pay $11.99 per month'

### Details 

In general a template can have more than one placeholder
* if there are $n$ placeholders, pass $n$ arguments in the call
* the arguments are turned into strings and inserted into the template

###  You Don't Need to Define the Template Ahead of Time

It's common to see a `print` that includes a template and a call to `format` all in the same statement:

In [45]:
print('{} × {} = {}'.format(a, b, a*b))

7 × 6 = 42


### Placeholders Can Have Types 

We can include extra information inside a placeholder.  In this template
* `:3d` means "the value inserted here will be formatted as a 3-digit integer"

In [46]:
for n in [2,3,5,7,11,13,17,19,23]:
    print('{:3d} × 7 = {:3d}'.format(n,7*n))

  2 × 7 =  14
  3 × 7 =  21
  5 × 7 =  35
  7 × 7 =  49
 11 × 7 =  77
 13 × 7 =  91
 17 × 7 = 119
 19 × 7 = 133
 23 × 7 = 161


Here is another format that prints a dollar amount.  This time the placeholder says the amount will be a float printed with 2 digits following the decimal point.

In [None]:
for n in [6,12,24,36]:
    total = n * 11.99
    print('Total payments over {:2d} months will be ${:.2f}'.format(n, total))

### Mini-Language 

There are a lot of things we can include in a template
* print numbers in decimal, binary, hexadecimal (base 16), ...
* print numbers in scientific notation, _e.g._ $3.42 \times 10^3$
* align strings on the left or right side or center them

Learn more about this "format mini-language" at [docs.python.org](https://docs.python.org/3.7/library/string.html#formatspec])

## In-Class Project: CpG Islands

Let's write a program together.  

Earlier we wrote a function to compute GC content (percentage of letters in a DNA sequence that are either G or C).

The new program will compute the CpG ratio: a statistic that indicates whether there are more CG dinucleotides than expected (a metric used by "gene finder" algorithms).


#### Notebook 

Download `CpG.ipynb` from the Bi 410 server -- start a Docker shell, then
```
$ cd Bi410
$ download CpG.ipynb
```

### Methodology 

Here is our recommended process for writing programs

#### (1) Create the basic notebook structure common to all projects: 

For Bi 410/510 projects the notebooks are (probably) already created.  If not, we suggest the following structure:

* an **introduction** that gives necessary background, including examples
* a section for a **specification**
  * a precise statement of the arguments passed to the function and the object(s) that will be returned
* code cells for the **implementation**
  * to start with, just put in a "stub" that has the `def` statement and `pass`
* code cells for **tests** (calls to the function)
  * for each test specify the expected results
  * suggestion: write the test cells even before implementing the function


#### 💎 &nbsp; (2) Write a "to do" list 

For all but the simplest projects (something the size and complexity of the `cels` function) you're going to need a plan

We suggest you write down the steps, in English, that you need to accomplish

Jupyter Notebooks are great for this strategy:  add a new markdown cell between the spec and the implementation, and write down the things you need to do

#### 💎&nbsp; 💎 &nbsp; (3) Code and Test 

Now you can work your way through the "to do" list

★ Each time you add a statement, or a small group of statements:
* execute the code cell to (re)define the function
* execute the code cells with your tests

At first the answer will be incorrect, but you'll be able to test what you just implemented, to make sure it works

#### 💎&nbsp; (4) Add Sandbox Cells as Needed 

Another great thing about Jupyter:  we can add code cells to try out expressions, test string methods, _etc_.

Example:  the GC project will use a string method named `count`.  If you're not sure how it works, add some code cells, type some expressions that tests how `count` works.

You can add sandbox cells at any time -- while you are making the To Do list, or if you get stuck while implementing one of the steps.

### Continual Testing 

The test-while-coding strategy described above is widely used in the software industry and is **strongly recommended** for Bi 410/510

👉 &nbsp; &nbsp; Find mistakes as soon as you make them

👉 &nbsp; &nbsp; If you put off testing too long, mistakes will accumulate, and when you finally start debugging you'll have a much harder time finding the "needles in a haystack"