<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#From-strings-to-lists-to-strings-to-lists" data-toc-modified-id="From-strings-to-lists-to-strings-to-lists-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>From strings to lists to strings to lists</a></span></li><li><span><a href="#If-and-else-if-branching-in-code" data-toc-modified-id="If-and-else-if-branching-in-code-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>If and else if branching in code</a></span></li><li><span><a href="#Reading-from-a-file" data-toc-modified-id="Reading-from-a-file-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Reading from a file</a></span></li><li><span><a href="#Creating-errors,-checking-for-errors-and-handling-errors" data-toc-modified-id="Creating-errors,-checking-for-errors-and-handling-errors-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Creating errors, checking for errors and handling errors</a></span></li><li><span><a href="#Challenges" data-toc-modified-id="Challenges-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Challenges</a></span></li></ul></div>

> All content here is under a Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) and all source code is released under a [BSD-2 clause license](https://en.wikipedia.org/wiki/BSD_licenses).
>
>Please reuse, remix, revise, and [reshare this content](https://github.com/kgdunn/python-basic-notebooks) in any way, keeping this notice.

# Module 3: Overview

We cover the following topics here:

1. Reviewing lists and strings, and interchanging between the two.
2. If-else branching in your code
3. Files: reading from a file; looping over their contents line by line.
4. Error creation, error checking and dealing with errors.

On the side we will cover some aspects of debugging.



### First, a quick warm-up, with 10 questions:

1. How many characters will be returned? ``"my_string"[0:5]``
2. And here? ``"my_string "[5:100:2]``
3. What method can you apply to strings to strip away any whitespace, like in the above string?
4. What method can you apply to strings to check if the contents of the string are all numbers?
5. What function do you use to determine the length of string?
6. And the length of a list?
7. You want to get, and then remove, the 4th value of the list. How do you do that?
8. What happens when you have 2 lists and you write this command: ``list_A + list_B``?
9. What is the value of ``output`` after this command: ``output = my_list.reverse()``?
10. Tell your colleague how you would write a for loop to print the values in this list, but from back to front: ``[9, 7, 5, 3, 2]``

## From strings to lists to strings to lists

In the [prior module](https://yint.org/pybasic02) we focused on lists and strings separately. We also saw they have a lot in common, in terms of behaviour. But there was one key difference: lists are *mutable*, and strings are *immutable* (unchangable).

Some things are just more intuitive with lists. For example, if we have:

```python
my_string = 'A long  sentence, with text    and faulty spacing.'``
```

How many words are in that sentence? It is easier if you can convert it to a list. 

**String to list**
```python
my_string = 'A long  sentence, with text    and faulty spacing.'``
string_as_list = my_string.split(' ')
print('There are {} words in the string.'.format(len(string_as_list)))
```

The ``.split(...)`` method is extremely useful, but we did not cover it last time. You can split on any character, or characters. Try it:
* ``my_string.split() # mmmmm, what does this do?``
* ``my_string.split('e')``
* ``my_string.split('en')``


**List back to a string**

Well to recombine a split string, you can use the ``''.join()`` method. Notice that I wrote it as ``''.join()``, indicating it is a method for string types.

Try these lines of code one-by-one, and ensure you can figure out what the ``join`` method does.

```python
my_list = ['Divided', 'we', 'fall,', 'but', 'united', 'we', 'stand.']
print(''.join(my_list))
print(' '.join(my_list))
print('\n'.join(my_list))
print('\t'.join(my_list))
print(')('.join(my_list))
```
Which one of the above are you most likely to use the most?

**Pro tip**: if you are generating an automated report, you can build up your report paragraph by paragraph:

```python
report = []
report.extend(...) # add a new section
report.extend(...) # add the next section
...

# Convert the whole report to a (long) string
print('\n'.join(report)) 

```

## If and else if branching in code

Like in other languages, Python also has the ability to create branches in the code. 

> if \_\_&lt;condition> \_\_ then \_\_&lt;action\>\_\_

They can also have an ``else`` part:

> if \_\_&lt;condition> \_\_ then \_\_&lt;action\>\_\_ else \_\_&lt;some other action\>\_\_

Or even multiple ``if else`` checks. These are the equivalent of the ``switch`` or ``case`` constructions found in other languages.

Indentation is important, as shown in this example.
```python
slope = ... # some code goes here to calculate the slope
if slope > 0:
    sign_of_slope = 'positive'
elif slope < 0:
    sign_of_slope = 'negative'
else:
    sign_of_slope = 'zero'
    
print('The slope was observed to be {}.'.format(sign_of_slope))
```

**Note:** you can have zero or multiple ``elif`` sections in this if-else ladder. The ``else`` part, if required, must go at the end of the ladder.

Use the above code to create a prototype for a robotic system which will automatically titrate a solution to a neutral pH, depending on the value of variable ``pH``. Print the appropriate string, depending on the condition:

* If $0 \leq \text{pH} \leq 5.5$: "The solution is acidic. Adding 2 mL of ammonia hydroxide, NH4OH."
* If $5.5 \lt \text{pH} \leq 8.5$: "The solution is neutral. Adding nothing."
* If $8.5 \lt \text{pH} \leq 14$: "The solution is basic. Adding 1.5 mL of phosphoric acid, H3PO4."
* If $\text{pH} < 0$ or $\text{pH} > 14$: "The pH value is out of range. Check your pH probe."

Run your code for some/all of these values of pH values to ensure all branches in your code are working as you expected: ``-4.2, 0, 3.721, 5.5, 5.500001, 8.5, 10.98765, 14, 140``

This is called code testing. In the ***Advanced section*** of the course we will come back to formal methods of code testing.

In the [prior module](https://yint.org/pybasic01) we were writing code to automatically write a report for us. The code generated this output:

> The regression trend of **45.9** mg/day was detected for this product, with a p-value of **0.00341**. This indicates that there is a **rising** trend over time.

Again, use the above code as starting point, but add to it. At the end, the code should be able to produce all 4 variants of the outputs shown below, depending on the value of ``slope`` and ``p_value``.
* The ``slope`` is either considered to be **rising** or **falling**.
* A ``p_value`` greater than 0.20 requires that an extra phrase be added.

*Variant 1*: The regression trend of **12.4** mg/day was detected for this product, with a p-value of **0.0141**. This indicates that there is a **rising** trend over time, which indicates an important influence.

*Variant 2*: The regression trend of **12.4** mg/day was detected for this product, with a p-value of **0.425**. This indicates that there is a **rising** trend over time, but it likely has no impact on the system.

*Variant 3*: The regression trend of **-5.2** mg/day was detected for this product, with a p-value of **0.142**. This indicates that there is a **falling** trend over time, which indicates an important influence.

*Variant 4*: The regression trend of **-5.2** mg/day was detected for this product, with a p-value of **0.209**. This indicates that there is a **falling** trend over time, but it likely has no impact on the system.

Check that your code correctly produces the output when:
* ``slope = 0.00542`` and ``p_value = 0.0419``
* ``slope = -521`` and ``p_value = 0.2000001``

## Reading from a file

We will cover simple reading from a text file here. The general, simple way to read a file containing any regular text is:

```python
filename = "myfile.txt"
f = open(filename, "r")
all_lines_as_string = f.read()
print(type(all_lines_as_string))

# Do something with the list ``all_lines``. One entry per line.

# Close the file afterwards:
f.close()
```

**Try these steps**:
1. Go to the same directory as where your Python script is being saved.
2. Create a new text file, called ``myfile.txt``, and write more than one line of text to your file.
3. Save your text file.
4. Run the above code, modifying it so that it will:

   * Print all the text in your file, ``all_lines_as_string``, in uppercase.
   * What is the length of ``all_lines_as_string``?
5. Change the above code so that the second line contains: ``all_lines_as_list = f.readlines()``

    * Verify that ``all_lines_as_list`` is indeed a list.
    * Write a for-loop that prints the length of each line:

        ``Line 1 has __ characters``
        
        ``Line 2 has __ characters``
        etc

The above is a good start with files, but there are some shortcomings which we can improve on:

* You must have the file called ``myfile.txt`` in the same directory as where you are running Python.
* You must not forget to close the file again.


1. Move the file ``myfile.txt`` to a different directory on your computer.
2. Use the following construction to create the ``full_filename`` variable :

```python
base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname'  # do you remember why we use the r'...' string?
filename = 'myfile.txt'
import os
full_filename = os.path.join(base_folder_windows, filename)
f = open(full_filename, "r")
# Do something with variable ``f``
f.close()
```
3. Modify the code you wrote in the prior question to use this new structure. Choose either the ``base_folder_mac_or_linux`` or ``base_folder_windows`` variable and modify it, based on the type of computer you are working on.
4. Verify that ``full_filename`` is indeed what you expect it to be: the full path name to your text file.


We still have not solved the last shortcoming, regarding closing the file. Read this Stackoverflow page [on why you should close files](https://stackoverflow.com/questions/25070854/why-should-i-close-files-in-python). Python can automatically close the file for you, if you use this structure shown below. Notice the code is essentially the same, expect you replace one line, and remove the ``f.close()`` statement.

```python
# This is the preferred way to open and use files in Python

base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname'  # why the r'...' string?
filename = 'myfile.txt'
import os
full_filename = os.path.join(base_folder_windows, filename)
with open(full_filename, "r") as f:
    # Do something with variable ``f``, for example:
    file_contents = f.readlines()
    # What "type" is ``file_contents``?
    # Other statements go in the with block, if required.
   
# The file will be closed at this point, when the ``with`` 
# block is exited.
```

Copy and paste your file handling code above, and modify it to use the ``with`` block instead.

## Creating errors, checking for errors and handling errors

Sometimes in our code we want to check if a variable has a certain value, or a certain condition, before we continue with the rest of the code.

1. Try this: ``print(4/0)`` and you will get a ``ZeroDivisionError``. We say that Python has **thrown** an error.

2. Try throwing another error:
```python
import math
pH = -0.024
log_pH = math.log(pH)
```
and you will get an error, specifically an error of the type called ``ValueError``.

3. Lastly, try this ``[1, 2, 3].pop(4)`` will throw which type of error? Does it make sense that that error is thrown?

There are several ways to deal with these errors. The crudest is to simply stop the program if a condition is not met.

```python
pH = 2.87
assert(pH > 0)

pH = -0.024
assert(pH > 0)
```

What difference do you see in the output between the two ``assert(...)`` statements?

A more sophisticated way to deal with this is to create an ``if-else`` branch in your code. In the above examples you can:

 1. check the value of the denominator before you do the division;
 2. check if pH values are above and below zero;
 3. check if the length of your list is long enough before you do ``.pop(4)``.
 

Writing an if-else for every anticipated situation [can be messy, and lead to slower code](https://stackoverflow.com/questions/7604636/better-to-try-something-and-catch-the-exception-or-test-if-its-possible-first).

A cleaner way to deal with this is with a try-except structure. You **try** to run the instructions, and should an **except**ion occur, then you *catch* the error that was *thrown* by the software.

```python
import math

try:
    # Some code goes here that calculates or gets the pH value
    pH = -0.024
    log_pH = math.log(pH)
    print('The logged pH is {}'.format(log_pH))
except ValueError as error:
    print('Cannot calculate a negative log.')
    print('Python returned this error: {}'.format(error))   
```

All the code in the **try** block will be executed and completed if no exceptions occur. The code in the **except** block will only run if the exception is triggered.

**Try it yourself** 

Try writing a try-except block for the situation where you calculate the roots of the quadratic equation: $ax^2 + bx + c = 0$. In other words, what are the values of $x$ that set the equation equal to zero. We saw this in a [prior session](https://yint.org/pybasic01).

$$x={\frac {-b\pm {\sqrt {b^{2}-4ac\ }}}{2a}}$$

Write the code, using a try-except structure, to calculate the value of $x$ for these 2 situations:
* a=3, b=6, c=3  `` # this will go through the 'try' block``
* a=0, b=1, c=2   ``# this will go through the 'except' block``


*Another try-except* exercise, but this time for files. 

```python
try:
    with open('non-existent-file.txt') as file:
        read_data = file.read()
except FileNotFoundError as error_nofile:
    print("Could not open file: {}.format(error_nofile))
```

Using try-except structures in your code makes your code robust.

For later, read this page, https://realpython.com/python-exceptions/, which nicely shows how to extend your knowledge, and use ``try-except-else-finally`` structures.

## Challenges

Next you should proceed to do either Challenge 1, or Challenge 2 (***or both!***). 

After that you should complete Challenge 3. Challenge 4 is a variation of Challenge 3, which you can easily complete as well.

# Challenge 1

Several websites provide random DNA sequences that you can use in your code. We would like to calculate various statistics on the DNA string, e.g. ``dna_string = 'CGAGATCAGATACGATTCTTATATTCTCAATGAGGAGCCAT'``.
* The number of C, G, T and A bases.
* The percentage of C, G, T and A bases. Does this look right?
* The number of (G and C) bases divided by the total base length.
* The number of (A and T) bases divided by the total length.
* The ratio of A/T and the ratio G/C.


1. It is always a good tip to develop your code on some input that you already know the answer to. So we will create a string of 4000 entries, with approximately 1000 C, G, T and A bases in the string. Then we know the above 5 statistics should approximately be: 

    * [1000, 1000, 1000, 1000]
    * [25, 25, 25, 25]
    * 2000/4000
    * 2000/4000
    * 1000/1000 and 1000/1000

    To create our "known input" we will use the ``numpy`` library in Python. We will see this library later on, but for now you can just use this code as-is:

    ```python
    import numpy as np 
    bases = ['C', 'G', 'T', 'A']
    length = 4000

    # Uniformly select letters from the above list (creates a balanced sample)
    dna_list = np.random.choice(bases, length).tolist()
    dna_string = ''.join(dna_list)   # see how useful the .join() function is?

    # Next, write the code here to calculate the statistics on this DNA sequence
```

2. Generate a string of 10000 DNA bases from the internet (https://www.bioinformatics.org/sms2/random_dna.html) and copy/paste them directly to a file on your computer. 

    Open that file, using the file reading code above, to calculate various statistics on this string. Do not hard-code any variables into your code: your code should be reusable for any length string.
    
    Do the statistics from the Bioinformatics site seem to come from a uniform distribution (25% chance for C, G, A or T), or do the distribution of the base pairs follow the distribution [discovered by Chargaff](https://en.wikipedia.org/wiki/Chargaff%27s_rules#Percentages_of_bases_in_DNA)? If so, which organism do they approximate?


# Challenge 2

This challenge is to read in a constant stream of values and calculate the moving average of them. This is, again, based, on a real case that happens rather frequently.

The [concentration of ammonia](http://openmv.net/info/ammonia) values can be downloaded and saved to your computer. Using the [code shown above](#Reading-from-a-file), create a ``with`` block, and read the values from the file line-by-line:

You should check that what you get here in Python matches what you see in Excel, or some other software that can open the CSV file for you.

```python
# Read the file directly from your local computer
filename = 'ammonia.csv'
with open(filename) as f:
    for index, concentration in enumerate(f.readlines()):
        
        # Skip the first line of the file: it is a text heading.
        if index == 0:
            continue
            
        # Convert the text to a float, and then do something with it ...
        print(float(concentration))
```

*Building up the problem*: 
1. Calculate the cumulative sum of the values in the CSV file.
2. Now modify your code and at the end divide by the total number of samples you added up, so you report the global average of all values in the file.
3. Next, solve the challenge: the idea is to calculate the moving average over $n=5$ values; called a window of 5 values. Accumulate the first 5 entries in the window and calculate the average. Print the average to the screen. Then throw away the first entry, add the 6th entry to update your window. Calculate the average based on the 2nd to the 6th value. Keep going until you run out of values. 

 As solution: the first moving average value is 36.92, then the next one is 38.476, etc.

 Modify your code only in 1 place to repeat the calculations, but with a window size of $n=15$ steps. In other words, your window size should not be *hard-coded* into your Python code.


# Challenge 3

This challenge is to create a very crude integrator for an ordinary differential equation. Based on Newton's law of cooling, placing an object, like a bottle of water in a cold environment, like a fridge, the temperature of the water, $T$, changing over time $t$, can be modeled as:
$$ \dfrac{dT}{dt} = -k (T-F)$$

The fridge has a constant temperature, $F=5$°C; for this system, the value of $k = 0.08$. The equation can be rewritten as: $$ \dfrac{\Delta T}{\delta t} = -k (T - F)$$ for a short change in time, $\delta t = 2$ minutes.
$$T_{i+1} - T_i = -k (\delta t)(T_i - F)$$ 

which shows how the temperature at time point $i+1$ (one step in the future) is related to the temperature now, at time $i$. 

You can rewrite the equation as: $$ T_{i+1} = T_i -k (\delta t)(T_i - F)$$

In a loop, show how the temperature changes over time, starting from the temperature $T_{i=0} = 25$°C. Your output should look something like this:

> At time 0 minutes the temperature is 25.0
> 
> At time 2 minutes the temperature is 21.8

1. How long does it take for your water to reach the temperature of the fridge?
2. What happens if you use a different $\delta t$ value during the integration? Try a really small value, or a larger value.
3. For a more "powerful" fridge which cools quicker: do you need a bigger or smaller value of $k$?
4. How long will it take a cup of water which has just boiled to cool down?

# Challenge 4

Like in the previous challenge we want to integrate an equation, but this time for bacteria growing on a plate.

The equation for growth is

$$ \dfrac{dP}{dt} = rx $$

where $P$ is the number of bacteria in the population, and $r$ is their rate of growth [number of bacteria/minute]. Integrating this equation will show exponential growth. This is not realistic. Eventually the bacteria will run out of space and their food source. So the equation is modified:

$$ \dfrac{dP}{dt} = rP - aP^2$$
where they are limited by the factor $a$ in the equation.

The differential equation can be re-written as: 
$$P_{i+1} - P_i = \left[\,rP_i  -a\,P_i^2\,\right]\delta t$$ 


which shows how the population at time point $i+1$ (one step in the future) is related to the population size now, at time $i$ over a short interval of time $\delta t$ minutes.  You can read more about these <a href = "https://math.libretexts.org/Bookshelves/Calculus/Book%3A_Calculus_(OpenStax)/8%3A_Introduction_to_Differential_Equations/8.4%3A_The_Logistic_Equation">logistic equation models</a>.

In a loop, show how the population changes over time, starting from an initial population of $P_{i=0} = 500$ bacteria. The growth rate for this culture is $r=0.032$ and the coefficient $a = 1.4 \times 10^{-7}$.


Your printed output should look something like this:

> At time 10 minutes there are 660 bacteria
>
> At time 20 minutes there are ___ bacteria

Try integrating over 

* different durations of time
* using different step sizes of $\delta t$
* Do you get a steady population size?
* Is that ultimate population size dependent on the number of bacteria you start with?

Later on in the course we will see how to collect these follow and plot them. But looking at the number, what shape does the curve have? Is it what you expect?

To cover during the interactive session:

* Debugging in Spyder.

In [1]:
# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())