<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Strings" data-toc-modified-id="Strings-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Strings</a></span></li><li><span><a href="#Lists" data-toc-modified-id="Lists-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Lists</a></span></li><li><span><a href="#For-loops:-iterating" data-toc-modified-id="For-loops:-iterating-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>For loops: iterating</a></span></li><li><span><a href="#If-and-else-if-branching-in-code" data-toc-modified-id="If-and-else-if-branching-in-code-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>If and else if branching in code</a></span></li><li><span><a href="#Commenting-and-variable-names" data-toc-modified-id="Commenting-and-variable-names-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Commenting and variable names</a></span></li></ul></div>

> All content here is under a Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) and all source code is released under a [BSD-2 clause license](https://en.wikipedia.org/wiki/BSD_licenses).
>
>Please reuse, remix, revise, and [reshare this content](https://github.com/kgdunn/python-basic-notebooks) in any way, keeping this notice.

# Overview of this session

We cover a diverse range of topics: 

* strings, 
* lists [also called  *vectors* , if you are used to MATLAB or C or Java]
* for-loops, 
* if-else branching in your code
* files. 

They seem unrelated, but they hang together conceptually: they all about sequences, or collections: characters in a strings, items in a list, loops to process the sequence, and files can be looped over line by line.

In between we will cover some topics related to commenting, error checking and debugging.

* intentionally creating errors.
* Reading a file: getting a line, getting lines into a list




## Strings


Strings are some of the simplest objects in Python. Last module you created several of them. Create this string in Python:

```python
s = """Secretly under development for the past three years, Bezos said the 
"Blue Moon" lander, using a powerful new hydrogen-powered engine generating up
to 10,000 pounds of thrust, will be capable of landing up to 6.5 metric tons 
of equipment on the lunar surface."""
```
Now use the above string to perform the following actions. Look up the Standard library help files for ``strings`` (like we showed last time) to find the methods required.

1. Print it to screen completely in upper case.
1. Print it to screen but with lower and uppercase characters switched around.
1. Try the following: ``print(s * 8)``.
1. Try the following: ``print(s + s)``. *Do these two mathematical operations make sense for strings?*
1. What is the length of this string?
1. How many times does the word "the" appear in the string?
1. At which position in the string does the word ``Secretly`` appear? *How does this differ with MATLAB?*
1. At which position in the string does the word ``Bezos`` appear?
1. Return a boolean ``True`` or ``False`` if the string ``endswith`` a full stop.
1. Return the string, replacing the instance of 'hydrogen' with 'nuclear'.
1. Replace every space in the above sentence with a newline character, and reprint the sentence to the screen.

The above are all effectively done using what are called ***methods***.

> A method an *attribute* of an *object*.

In the above, a ``string`` is your *object* and objects have one or more attributes.

Some tips:

1. You can get a **list** [we cover lists next!] of all attributes using the ``dir(...)`` command.

```python
s = """Secretly under development for ... the lunar surface."""
dir(s)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
```
You can ignore all the attributes beginning and ending with a double underscore, for example ``__add__``. The attributes which are of practical use to you are the ones starting from ``capitalize``, all the way to the end.

2. You don't need to create a string ``s`` first to get a list of the attributes. You can also use this shortcut:

```python
dir('')
dir(str)
```

3. If you see an attribute that looks interesting, you can request help on it:  ``help(''.startswith)`` or ``help("".startswith)``. Notice the ``''`` in the brackets: it creates an empty string, and then accesses the attribute ``.startswith`` and then asks for help on that. 

You will get a piece of help text printed to the screen. This is helpful later on when you are comfortable with Python. In the beginning it is more helpful to search in a search engine, which will give you a page with examples. The built-in Python help is usually very very brief.


Use this knowledge know to figure out what the difference is between ``s.find`` and ``s.index``.  Make sense?


You can do what is called *slicing* on a string. Slicing is the ability to get sub-parts of a string:

```python
word = 'landing'
print(word[1:4])
```

* How long is printed text on the screen? 
* Again, for MATLAB users: how does that differ with what you are used to?
* What is returned with ``word[3:]``?
* What is returned with ``word[3:99]``?
* What is returned with ``word[2:6:3]``?
* And try this: ``word[6:2:-1]``
* And lastly ``word[-4:-7:-1]``

Speaking of DNA ... create this sequence in Python:

```python
seq = """TAGGGGCCTCCAATTCATCCAACACTCTACGCCTTCTCCAAGAGCTAGTAGGGCACCCTGCAGTTGGAAAGGGAACTATTTCGTAGGGCGAGCCCATACCGTCTCTCTTGCGGAAGACTTAACACGATAGGAAGCTGGAATAGTTTCGAACGATGGTTATTAATCCTAATAACGGAACGCTGTCTGGAGGATGAGTGTGACGGAGTGTAACTCGATGAGTTACCCGCTAATCGAACTGGGCGAGAGATCCCAGCGCTGATGCACTCGATCCCGAGGCCTGACCCGACATATCAGCTCAGACTAGAGCGGGGCTGTTGACGTTTGGGGTTGAAAAAATCTATTGTACCAATCGGCTTCAACGTGCTCCACGGCTGGCGCCTGAGGAGGGGCCCACACCGAGGAAGTAGACTGTTGCACGTTGGCGATGGCGGTAGCTAACTAAGTCGCCTGCCACAACAACAGTATCAAAGCCGTATAAAGGGAACATCCACACTTTAGTGAATCGAAGCGCGGCATCAGAATTTCCTTTTGGATACCTGATACAAAGCCCATCGTGGTCCTTAGACTTCGTGCACATACAGCTGCACCGCACGCATGTGGAATTAGAGGCGAAGTACGATTCCTAGACCGACGTACGATACAACTATGTGGATGTGACGAGCTTCTTTTATATGCTTCGCCCGCCGGACCGGCCTCGCGATGGCGTAG"""
```

* What is the first occurrence of ``GATTAG`` in the sequence?
* How many times does ``TTTT`` occur?
* Replace all ``A`` entries with ``T``'s and all ``C`` entries with ``G``'s.

## Lists

We will cover creating, adding, accessing and using lists of objects.

You have seen this before: create a list with the square bracket characters: ``[`` and ``]``.

For example: ``words = ['Mary', 'loved', 'chocolate.']``

One of the most useful functions in Python is ``len(...)``. Verify that it returns an integer value of 3. Does it have the **type** you expect?

The entries in the list can be mixed types (contrast this to most other programming languages!)
    
```python
group = ['yeast', 'bacillus', 994, 'aspergillus' ]
```

An important test is to check if the list contains something:

```python
'aspergillus' in group
499 in group
```

Like we saw with strings, you can use the ``*`` and ``+`` operators:

```python
group * 3
group + group   # might not do what you expect!
group - group   # oooops
```

And like strings, you refer to them based on the position counter of 0:
```python
group[0]

# but this is also possible:
group[-3]

# however, is this expected?
group[4]
```

Lists, also have have some methods that you can use. Lists in fact have far fewer methods than strings. Remember how to get a list of methods from the prior module?

```python
dir(....) # what do you fill in here?
```

How many methods do you see which you can apply to a list? 

Let's try a few of them out:
1. ``append`` a new entry to the list you created above, called ``group``: add the entry "Candida albicans"
1. Create a new list ``reptiles = ['crocodile', 'turtle']`` and then try: ``group.extend(reptiles)``.
1. Print the list. Remove the ``crocodile`` entry from the list. Print it again to verify it succeeded. 
1. Now try to remove the entry again. What happens?
1. Use the following command: ``group.reverse()``, and print the ``group`` variable to the screen.
1. Now try this instead: ``group = group.reverse()`` and print the ``group`` variable to the screen. What happened this time?
1. So you are back to square one: make a new list variable ``group = ['yeast', 'bacillus', 'aspergillus' ]`` and try ``group.sort()``. Notice that ``.sort()``, like the ``.reverse()`` method operate *in-place*: there is no need to assign the output of the action to a new variable. In fact, you cannot.
1. Here's something to be aware of: create ``group = ['yeast', 'bacillus', 994, 'aspergillus' ]``; and now try ``group.sort()``. What does the error message tell you?


Lists behave like a stack: you can add things to the end using ``.append()`` and you can remove them again with ``.pop()``.

Think of a stack of plates: last appended, first removed.

Try it:
```python
species  = ['chimp', 'bacillus', 'aspergillus']
species.append('hoooman')
first_out = species.pop()
print(first_out)
```
* What is the length of the list after running this code?
* Try adding a new entry ``arachnid`` between ``chimp`` and ``bacillus`` using the ``.insert()`` command. Print the list to verify it. 
> If you don't know how to use the ``.insert()`` method, but you know if exists, you can type ``help([].insert)`` at the command prompt to get a quick help. Or you can search the web which gives more comprehensive help, with examples.



* Contrast this to strings: immutable vs mutable.
* See mikedane resources
* Find the exact entry
* Find the closest entry
* Find entries which appear more than once in a list
* Common elements into two different lists. Join function equivalent?
* Count how many entries in a list are greater than a threshold. List comprehension
* Contrast with tuples. Seen before. Immutable concept introduced
* Index slices. Doesn't return last element
* Lists: append, insert, extend. Sort is special..clear() too
* Read file for the above DNA string
    import urllib.request
    with urllib.request.urlopen('http://python.org/') as response:
        html = response.read()

List comprehensions:
    
    
Make a list with first 10 elements of the  multiplication table of 2 (tafel van 2) using "Python list comprehension"
``print([2**x for x in range(1,11)])``

## For loops: iterating

The ``for`` loop is used to run a piece of code a certain number of times. The basic structure with an example, to print the integer values from 3 up to, and including 8 is:

```python
# This is one way to do it:
for i in range(3, 9):
    # You can have many lines of code here. Only 2 are statements are shown.
    print(i)
    print('-----')
    
```

Before the command ``print(i)`` is a tab character or 4 spaces. Please use spaces, and not tabs. Especially if you will interact with other colleagues writing code. Therefore the letter ``p`` from ``print`` goes exactly under the ``i``. 

That ``i`` is the loop counter. The ``range(3, 9)`` tells how many times the loop will iterate.

Use ``list(range(3, 9))`` to see a list representation of the ``range()`` function. Try creating these ranges:

* Every integer from 0, up to and including 12.
* Every integer from 0, up to and including 12, in steps of 2
* Every integer from 12 down to and including 0, in steps of -3
* Use a ``range`` command to create the values ``[-10, -40, -70]``
* Values between 0.5, up to and including 9.5, in steps of 0.5

1. Inside the for loop you can one or more statements. In the above there are 2 statements and a comment. It is usual to start your comment - if it is required -- with an indent as well. This way it is clear the comment refers to the contents of the for-loop.

2. You can name the variable that you iterate with anything you like, as long as it is a valid variable name. Remember those from [last time](https://yint.org/pybasic01)?

You can loop over many types of objects in Python. Try this:

```python
reptiles = ['crocodile', 'turtle', 12.34, 'lizard', 'snake', False]
for animal in reptiles:
    print('The "animal" object is of type ' + str(type(animal)))
```

and here you can see *dynamic typing* at its finest. Very useful for coding.

You can also iterate over the entries of a string!
```python
sequence =  "TAGGGGCCTCCA"
number = 1
for base in sequence:
    print('Base number {} is {}'.format(number, base))
    number += 1
```

In the above we introduced another concept: that you can print with the ``.format()`` command. We will see more of this later, but then it won't be a surprise.


Now that you have seen how you can iterate over the items of a list, let's try to put this to use:

1. Print the 3-times table, from 1 up till 12, like you learned in school:
> 3 times 1 is 3
>
> 3 times 2 is 6
>
> 3 times 3 is 9
> ...
2. If you haven't done so already, re-write your code to use the ``.format()`` command, as demonstrated above.
3. With 1 line of code find at which position in the list the value of 42 appears: ``[0, 3, 9, 12, 27, 35, 42, 50, 66]``
4. *Based on a real example from just last week*: find the value in the previous list closest to ``19``. Note: don't worry about short code, or efficiency. Just find the answer. In the real example the list was thousands of entries long and was to find the closest time within $\pm$ 5 minutes. Then you need to worry about efficiency.



**Advanced tip:** sometimes you want to iterate through a list, but also know which entry you are iterating on. You can do both simultaneously with the ``enumerate`` command.

```python
names = ['Leonardo', 'Carl', 'Amiah', 'Yaretzi', 'Destiny', 'Alan']
for index, name in enumerate(names):
    print('{} is number {} in the list'.format(name, index+1))
```

What ``enumerate`` does is to create a ``tuple`` with 2 entries. These two entries are dynamically assigned: the first one is an ``integer`` assigned to ``index`` and the second one is assigned to ``name`` in this example. You are free to choose both variable names.

## If and else if branching in code

Like in other languages, Python also has the ability to create branches in the code. 

> if \_\_&lt;condition> \_\_ then \_\_&lt;action\>\_\_

They can also have an ``else`` part:

> if \_\_&lt;condition> \_\_ then \_\_&lt;action\>\_\_ else \_\_&lt;some other action\>\_\_

Or even multiple ``if else`` checks, like the ``switch`` or ``case`` constructions found in other languages.

Indentation is important, as shown in this example.
```python
slope = ... # some code goes here to calculate the slope
if slope > 0:
    sign_of_slope = 'positive'
elif slope < 0:
    sign_of_slope = 'negative'
else:
    sign_of_slope = 'zero'
    
print('The slope was observed to be {}.'.format(sign_of_slope))
```

In the prior module we were writing code to automatically write a report for us. The code generated this output:

> The regression trend of **45.9** mg/day was detected for this product, with a p-value of **0.00341**. This indicates that there is an important **rising** trend over time.

Use the above code as starting point, but add to it. At the end the code should be able to produce all 4 variants of the outputs, depending on the value of ``slope`` and ``p_value``. The ``slope`` is either considered to be **rising** or **falling**, and a ``p_value`` greater than 0.20 requires that an extra phrase be added.

Variant 1: The regression trend of **12.4** mg/day was detected for this product, with a p-value of **0.0141**. This indicates that there is an important **rising** trend over time.

Variant 2: The regression trend of **12.4** mg/day was detected for this product, with a p-value of **0.425**. This indicates that there is an important **rising** trend over time, but it likely has no impact on the system.

Variant 3: The regression trend of **-5.2** mg/day was detected for this product, with a p-value of **0.142**. This indicates that there is an important **falling** trend over time.

Variant 4: The regression trend of **-5.2** mg/day was detected for this product, with a p-value of **0.209**. This indicates that there is an important **falling** trend over time, but it likely has no impact on the system.

Check that your code correctly produces the output when:
* ``slope = 0.00542`` and ``p_value = 0.0419``
* ``slope = -521`` and ``p_value = 0.2000001``


Newton's method for square root

```python
import math
y = 13.0
n = 3                           # number of significant figures
rel_error = 0.5 * 10 ** (2-n)   # relative error calculation
x = y / 2.0
x_prev = 0.0
iter = 0
while abs(x - x_prev)/x > rel_error:
    x_prev = x
    x = (x + y/x) / 2.0
    print(abs(x - x_prev)/x)
    iter += 1
    
print('Used %d iterations to calculate sqrt(%f) = %.20f; '
      'true value = %.20f\n ' % (iter, y, x, math.sqrt(y)))
```

## Commenting and variable names

Comments are many times as important as the code itself. But it takes time to write them.

The choice of variable names is related to the topic of comments. In many ways, the syntax of Python makes the code self-documenting, meaning you do not need to add comments at all. But it definitely is assisted by choosing meaningful variable names:
>```python
>for genome in range(len(sequenced_genomes)):
>    <do something with genome>
```

This quite clearly shows that we are iterating over the all genomes in some iterable (it could be a list, tuple, or set, for example) container variables of sequenced genomes.

But here the code structure is identical:
>```python
>for k in range(len(sequences)):
>    <do something with k>
```
Later on in the code it might not be clear what ``k`` represents. 
    
Comments should be added in these cases:
* refer to a publication or internal company report, 
* refer to a website where you got inspiration/based your algorithm on
 

To cover if time:

* Creating code cells in Spyder: ``# %% text here (in Spyder)``
* Debugging in Spyder.

In [None]:
# Python Collections (Arrays)

There are four collection data types in the Python programming language:

- List is a collection which is ordered and changeable. Allows duplicate members.
- Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
- Set is a collection which is unordered and unindexed. No duplicate members.
- Dictionary is a collection which is unordered, changeable and indexed. No duplicate members.
