<a name="top"></a>

# Introduction to Python Programming for Bioinformatics. Lesson 2.

<details>
<summary>
About this notebook
</summary>

This notebook was originally written by [Marc Cohen](https://github.com/mco-gh), an engineer at Google. The original source can be found on [Marc's short link service](https://mco.fyi/), and starts with [Python lesson 0](https://mco.fyi/py0), and I encourage you to work through that notebook if you find some details missing here.

Rob Edwards edited the notebook, adapted it for bioinformatics, using some simple geneticy examples, condensed it into a single notebook, and rearranged some of the lessons, so if some of it does not make sense, it is Rob's fault!

It is intended as a hands-on companion to an in-person course, and if you would like Rob to teach this course (or one of the other courses) don't hesitate to get in touch with him.

</details>
<details>
<summary>
Using this notebook
</summary>

You can download the original version of this notebook from [GitHub](https://linsalrob.github.io/ComputationalGenomicsManual/Python/Python_Lesson_2.ipynb) and from [Rob's Google Drive]()

**You should make your own copy of this notebook by selecting File->Save a copy in Drive from the menu bar above, and then you can edit the code and run it as your own**

There are several lessons, and you can do them in any order. I've tried to organise them in the order I think most appropriate, but you may disagree!

</details>

<a name="lessons"></a>

# Lesson Links

* [Lesson 2 - Expressions](#Lesson-2---Expressions)
  * [Constants vs. Variables](#Constants-vs.-Variables)
  * [Data Types](#Data-Types)
  * [The Boolean (bool) Type](#The-Boolean-(bool)-Type)
  * [The None Type](#The-None-Type)
  * [Comparison Opererators](#Comparison-Opererators)
  * [Boolean Operators - and, or, and not](#Boolean-Operators---and,-or,-and-not)
  * [Order of Evaluation](#Order-of-Evaluation)
  * [Python Precedence Rules](#Python-Precedence-Rules)
  * [F strings](#F-strings)

Previous Lesson: [GitHub](Python_Lesson_1.ipynb) | [Google Colab](https://colab.research.google.com/drive/11rcjrwznjZS-qoFoEcCQYF85PCRtN9dr)

Next Lesson: [GitHub](Python_Lesson_3.ipynb) | [Google Colab](https://colab.research.google.com/drive/1NbWawPfWAQV2x56rG0SvcMNpXI7sn3R0)


# Lesson 2 - Expressions


Things you'll learn in this lesson:
- More about types in Python
- Boolean operators
- How to combine constants, variables, and operators into arithmetic, boolean, and comparison expressions
- Operator precedence
- The magical f-string

Link to the original version of this notebook on [Marco's short link service](https://mco.fyi/py2)


# Constants vs. Variables

* Literal values (like `"Rob"` and `2010`) are called constants because they don't change.

* Constants, are called constants because its value is fixed, unlike variables, whose associated value may change (or *vary*) over time.

* The data a variable refers to may be simple, e.g. a number or a string, or it may be complex, e.g. a list or an object (we'll learn about those later).

# Data Types

In Python, values have a *type*. We already saw three data types in the previous lesson. We'll take a look at a few other types.

# The Boolean (`bool`) Type

Python supports a special type called booleans, written `bool` in Python, which are used to indicate whether something is true or false. Booleans have one of two possible values:

* `True`
* `False`

When evaluating a **number** as a boolean, the following rules apply:

* 0 is `False`
* 0.0 is `False`
* all other numerical values are `True`

When evaluating a **string** as a boolean, the following rules apply:

* the empty string (`""` and `''`) is `False`
* all other strings are `True`

If it's something, its True. Otherwise its _NOT_.


# The None Type

Python has a special type called `None` and it means *no value*.

It's a good choice when you want to initialize a variable without an obvious choice for the initial value, like this:

```bacteria = None```

None always evaluates to False in boolean expressions.


# Comparison Opererators

As their name suggests, comparison operators allow us to compare values and result in a boolean type indicating whether the comparison is `True` or `False`.

The following table summarizes the most commonly used operators in Python, along with their definition when applied to numbers and strings.

|operator|operation on numbers|operation on strings|
|--------|--------------------|--------------------|
|==|equal to|equal to|
|!=|not equal to|not equal to|
|>|greater than|lexicographically greater than|
|>=|greater than or equal to|lexocographically greater than or equal to|
|<|less than|lexicographically less than|
|<=|less than or equal to|lexocographically less than or equal to|

(Remember the crocodile!)


## Challenge

Which boolean value (`True` or `False`) do each of these expressions evaluate to?

* `123 == 10`
* `10 == 123`
* `123 == 123`
* `123 != 321`
* `123 != 123`
* `age == 65`
* `age != min_age`

* `"E. coli" == "Salmonella"`
* `"E. coli" == "E. coli"`
* `"E. coli" == "E.coli"`
* `"E. coli" != "e. coli"`

### `>` and `>=`
* `123 > 10`
* `10 > 123`
* `123 > 123`
* `123 >= 123`


# Boolean Operators - `and`, `or`, and `not`

Boolean operators are special operators in Python that let you combine boolean values in logical ways corresponding to how we combine truth values in the real world. An example of a boolean **and** expression would be "I'll buy a new phone if I like the features **and** the price is low". There are three main boolean operators: `and`, `or`, and `not`. We'll look at examples of each in the next cells.

## Boolean `and`

* `A and B`

is `True` only true when both A and B are `True`, otherwise it's `False`.

Example:

* I ride my bike only when it's both sunny and warm.
* In other words, if `is_sunny` and `is_warm` are both `True` then `is_sunny and is_warm` is `True` so I **will** ride my bike.

In Python...
```
if is_sunny and is_warm:
    # ride bike
```

We haven't learn about `if` statements so don't worry if the previous construct looks unfamiliar. It's a simple way of checking the value of a boolean expression, but we'll dive deeper into `if` statements shortly.


In [None]:
is_sunny = False
is_warm = False
print(is_sunny and is_warm)

False


### Truth Table for boolean `and`
|`var1`|`var2`|`var1 and var2`|
|------|------|---------------|
|`False`|`False`|`False`|
|`True`|`False`|`False`|
|`False`|`True`|`False`|
|`True`|`True`|`True`|

## Boolean `or`

* `A or B`

is `True` when either A or B are `True`, or when both are `True`, otherwise it's `False`.

Example:

* I ride my bike  when it's sunny, or warm, or both
* In other words, if `is_sunny` or `is_warm` (or both) are `True` then `is_sunny or is_warm` is `True` so I **will** ride my bike.

In Python...
```
if is_sunny or is_warm:
  # ride bike
```

### Truth Table for boolean `or`

|`var1`|`var2`|`var1 or var2`|
|------|------|---------------|
|`False`|`False`|`False`|
|`True`|`False`|`True`|
|`False`|`True`|`True`|
|`True`|`True`|`True`|

## Logical Not

* `not A`

is `True` when A is `False`
is `False` when A is `True`

### Truth Table for boolean `not`

|`var1`|`not var1`|
|------|---------|
|`False`|`True`|
|`True`|`False`|

# Expressions Revisited

* Python lets us combine variables, constants and operators into larger units called expressions.
* Expressions appear in many places
  * assignment statements. Here we are assigning the new number to the same variable
    * `age = age + 1    # we do this every birthday`
    * we could write this in two steps:
       * `tmp = age + 1`
       * `age = tmp`
    * but using one assignment is simpler, easier, and cleaner
  * function calls
    * `print(total_days * 365) # number of days alive`
* As we learn more, we'll see expressions popping up all over the place

In [None]:
age = 22
print(age)
age = age + 1
print(age)
age += 1
print(age)

22
23
24


In [None]:
age = 22
days_per_year = 365
days_old = age * days_per_year
print(f"I was {days_old} days old on my last birthday!")

I was 8030 days old on my last birthday!


## Types of Expressions

* arithmetic expressions

`genome_size = chromosome_1_size + chromosome_2_size`

* comparative expressions

`genome_size == 0`

* boolean expressions

`chromosome_1 and plasmid`

* combinations of the above

`plasmid and (chromosome_1_size + chromosome_2_size + plasmid_size) < genome_size`


## Order of Evaluation

How does Python know the correct order to evaluate a complex expression?

Example: `4 + 1 * 5`

Is that `(4 + 1) * 5`, which is `25`?  
Or is it `4 + (1 * 5)`, which is `9`?

Another example:  True or False and False

Is that `(True or False) and False`, which is `False`?  
Or is it `True or (False and False)`, which is `True`?

Python uses operator precedence rules to avoid this ambiguity and evaluate expressions in a predictable way.

## Python Precedence Rules

This is a subset of the complete rules (in order of highest to lowest precedence):

* parentheses (innermost to outermost, left to right)
* exponentiation (left to right)
* multiplication, division, modulus (left to right)
* addition, subtraction (left to right)
* comparisons (left to right)
* boolean not
* boolean and
* boolean or

[The Official Rules](https://docs.python.org/3/reference/expressions.html#operator-precedence)

## Practical Advice

**When in doubt, use parentheses.**

Coders make liberal use of parentheses because:
* You don't need to remember the precedence rules.
* You don't have to worry about surprises.
* It makes code more readable.
* It eliminates any ambiguity

For example, we could write this expression, which evaluates `A and B` first, then `C and D`, and finally takes the boolean `or` of the two preceding results:

`A and B or C and D`

but we much prefer to make explicit, like this so we don't have to _think_ (about precedence rules) every time we look at this code:

`(A and B) or (C and D)`

## F-strings

We often need to combine variables, values, and strings. For example, if we have the following variables:

- `bacteria_name`
- `genome_size`

we might want to print a report, where each line summarizes the values above. We could do that like this:

```
print("bacteria: ", bacteria_name, ", genome size: ", genome_size, " bp")
```
which produces this output:
```
bacteria: E. coli genome size: 4500000 bp
```

This sort of construct gets a bit tedious. Plus the space between the customer id and the following comma unintended and undesirable.

Python has a simple approach, called f-strings, that offer a more readable solution to this problem. If you prefix a string with the character `f`, it gives the string magic powers. Specifically, the sting has the ability to **interpolate** variables inside curly brackets. Here's how we could express the previous `print` statement using an f-string:

```
print(f"bacteria: {bacteria_name}, genome size: {genome_size} bp")
```

This is shorter, less tedious, easier to read and write, and solves the formatting issue related to the comma between the two fields.

Note that you can put any Python code inside the curly braces, so this technique is very powerful. Once you get going with Python, you'll find all sort of wonderful ways to use f-strings.

There is also a neat trick that can help with large numbers! Adding a `:,` after a number means `automatically insert thousands separators`:

```
print(f"bacteria: {bacteria_name}, genome size: {genome_size:,} bp")
```


In [None]:
bacteria_name = "E. coli"
genome_size = 4500000
print(f"bacteria: {bacteria_name}, genome size: {genome_size:,} bp")

bacteria: E. coli, genome size: 4,500,000 bp


[Return to the lesson listing](#lessons)

[Return to the top of the notebook](#top)