# Python basics

## Simple data types

Python has a small number of basic data types: `bool`, `int`, `float`, `complex`, and `str`.

### `bool`

This type has only two values, `True` and `False`.  Operations are limited to logical `not`, `and`, and `or`.

In [1]:
a = True
b = False
a and not b

True

In [2]:
c = False
not (a or b) and c

False

### `int`


This type represents the integer numbers.  In Python 3, there is no limit to the size of these numbers (except for those imposed by the memory of your computer).  The usual arithmetic operations are miplemented: `+`, `-`, `*`, `//`, `%`, and `**`.

In [3]:
a = 3
b = -5
a*b

-15

Note a finer point: `-` does double duty as a uniry and binary operator.  The uniary operator changes the sign of an integer, the binary represents the substraction.

In [4]:
a = 5
b = 3
a//b

1

On `int`, the `//` operator denotes integer division.

In [5]:
a = 3
b = 5
a//b

0

The `%` operator is the modulo operator, and computes the remainder after integer division.

In [6]:
a = 5
b = 3
a % b

2

Note that the ordinary division operator can also be applied on `int` values, but that the result is a `float`.  The semantics of `/` for `int` changed from Python 2 to Python 3.

In [7]:
a = 5
b = 3
a/b

1.6666666666666667

In [8]:
type(a/a)

float

Python also has the power operator `**`.

In [9]:
a = 2
b = 3
a**b

8

In [10]:
type(a**b)

int

### Brief asside: assignment

A syntactic construct that occurs over and over again is an assignment to update the value of a variable, e.g., `a = a + 2`. Given its prevalence, there is "syntactic sugar" to save some typing: `a += 2`. Similar shortcuts are defined for all arithmetic operators: `-=`, `*=`, `/=`, `//=`. This is not only the case for the operators defined on `int`, but also for those on `float`, `complex`, and the concatentation operator `+` defined on `str` (see following sections).

### `float`

This type represents real numbers. Note however that a `float` value is represented by 64 bits (double precision) which has a few important implications.  Although we tend to think of a `float` value as a real number, this type can not represent real numbers in the mathematical sense. The set of real numbers is infinite, and not countable, while the total number of `float` values is at most $2^{64}$. Hence a number of properties of real numbers do not hold for `float`, and computations with `float` numbers are subject to round-off errors. Information on the limits for `float` can be obtained throught he `sys` module.

In [11]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

The usual operators are defined on `float`: `+`, `-`, `*`, `/`, `//`, `%` and `**`.

In [2]:
a = 3.1
b = 5.2
a/b

0.5961538461538461

The semantics of the `//` for floating point numbers is floor division, i.e., division with the result rounded down.

In [13]:
a//b

0.0

The function you expect to be defined for floating point numbers are available in the `math` module, and have to be imported before they can be used.

In [14]:
from math import sqrt
sqrt(a)

1.760681686165901

In [4]:
from math import isclose
isclose(a, a//b + a % b)

True

In [5]:
help(isclose)

Help on built-in function isclose in module math:

isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)
    Determine whether two floating point numbers are close in value.
    
      rel_tol
        maximum difference for being considered "close", relative to the
        magnitude of the input values
      abs_tol
        maximum difference for being considered "close", regardless of the
        magnitude of the input values
    
    Return True if a is close in value to b, and False otherwise.
    
    For the values to be considered close, the difference between them
    must be smaller than at least one of the tolerances.
    
    -inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
    is, NaN is not close to anything, even itself.  inf and -inf are
    only close to themselves.



### `complex`

Complex numbers are represented by `complex`, and the same caveats regarding the representation of `float` holds for this type as well. The real and imaginary parts are both represented using 64 bit, so each are double precision.

In [15]:
a = 3 + 4j
b = 5 + 6j
a*b

(-9+38j)

Note that `j`, rather than `i` is used to denote the imaginary part of complex number.  This notation is borrowed from engineering, where it is more common than in mathematics of physics.

Getting the real and imaginary parts of a complex number is quite straightforward, as is computing the conjugated.

In [16]:
a.imag

4.0

In [17]:
a.real

3.0

In [18]:
a.conjugate()

(3-4j)

The mathematical functions defined om complex numbers are available in the `cmath` module, e.g.,

In [19]:
from cmath import sqrt as csqrt

In [20]:
csqrt(a)

(2+1j)

Note that we imported the complex `sqrt` under a different name to avoid confusion with the `sqrt` function we already imported from `math`.  We use an _alias_.

#### Your turn now: arithmetic and assignement

Consider the following Python statements:
  1. `print(a)`
  1. `a =  b + 3`
  1. `a = 7`
  1. `b += 4`
  1. `print(b)`
  1. `b = 1`

Order these statement (not necessarily all) such that the resulting code fragment produces the output:

`
1
8
`

Did you require all statements? Is there more than one possible ordering that will produce the correct output?

### `str`

The type to represent textual data in Python is `str`. A value of this type is a sequence of characters. The only operators are `+`, concatenation, and `in`, substring membership.

In [21]:
a = 'This is '
b = 'a text.'
a + b

'This is a text.'

In [22]:
a = 'this is some text'
b = 'some'
b in a

True

In [23]:
b = 'something'
b in a

False

Literal `str` values are delimited by single quotes, `'`, or by double quotes `"`.

Almost all characters can be used in an `str` value, expect the single quote character `'` in single quoted strings, the double quote character `"` in double quoted strings, and the backslash `\`. If we want to represent those characters in a string, we have to use `\'`, `\"`, and `\\` respectively.

In [24]:
a = 'you\'ll have to take care to use \\\', but " is not problem'
print(a)

you'll have to take care to use \', but " is not problem


In [25]:
a = "you'll have to take care to use \\\", but ' is not problem"
print(a)

you'll have to take care to use \", but ' is not problem


Apart from quotes and backslash, a few other characters have a special representation in an `str`, notably the new line `\n`, the carriage return `\r`, and the tab character `\t`.

In [26]:
a = 'this is line 1\nthis is line 2\n\tthis is line 3, indented'
print(a)

this is line 1
this is line 2
	this is line 3, indented


On Microsoft Windows operating systems, lines are not ended by a new line character only, but by carriage return and new line, so by `\r\n`. Under MacOS and Linux, lines simply end with a new line.  Mostly, you need not worry about this since Python usually does the "right thing" on each operating system, but it is useful to be aware to deal with some problems this may cause.

It is easy to access the characters of a string individually by their index in the string. The first character has index 0, the second index 1, and so on.

In [27]:
a = 'abcde'
print(a[0], a[1])

a b


The function `len` can be used to determine the length of the string, so the last character would be at `a[len(a) - 1]`, the one to last at `a[len(a) - 2]`, and so on.

In [28]:
print(a[len(a) - 1], a[len(a) - 2])

e d


However, Python has a nice shortcut to make this a lot easier.  The index of the last character in a string is -1, the one before last -2, and so on.

In [29]:
print(a[-1], a[-2])

e d


It is also straightforward to extract substrings from it. For instance, given `'abcde'`, we can select the 2nd up to and including the 4th character, i.e., `'bcd'`, as follows:

In [30]:
a = 'abcde'
a[1:4]

'bcd'

You might have expected `a[2:4]` rather than the above, however, as discussed above, the first character of the string has index 0, so the second would be `a[1]`, and selection in Python is "up to, but not including, so rather than 3, which is the index of the 4th character, we have to specify 4, since that is the index we do _not_ want to include.

Selecting the first 3 characters from a string can be done as follows:

In [32]:
a[:3]

'abc'

Selecting all characters from the 4th to the end of the string is done similarly.

In [33]:
a[3:]

'de'

If, for whatever obscure reason, we might want to select every second character from the string starting a the 2nd, and ending at the 7th, we can use the most general form of substring selection, in this case using a step size of 2.

In [34]:
a = 'abcdefghijklm'
a[1:7:2]

'bdf'

As before, we can leave out either the start index (so we start from the beginning of the string), or the end index (so we stop at the end of the string), or both. Moreover, the step size can be negative. We can use this to reverse a string.

In [35]:
a[::-1]

'mlkjihgfedcba'

This selection mechanism, called slicing, is quite universal in Python, and we will encounter it again when discussing lists and arrays.

Perhaps you expect that it is possible to modify one or more characters in a string using this mechanism, but that won't work. Values of type `str` are immutable, i.e., once created, they can not be modified.

In [36]:
a[0] = 'A'

TypeError: 'str' object does not support item assignment

Iterating over the characters of a string can be done using a for-loop, as shown below.

In [37]:
i = 0
for character in a:
    print('character', i + 1, 'is', character)
    i += 1

character 1 is a
character 2 is b
character 3 is c
character 4 is d
character 5 is e
character 6 is f
character 7 is g
character 8 is h
character 9 is i
character 10 is j
character 11 is k
character 12 is l
character 13 is m


The `str` class defines many very useful methods for string manipulation.

In [38]:
print(a.upper(), a.capitalize())

ABCDEFGHIJKLM Abcdefghijklm


Note that the value of the variable `a` was not modified, remember, `str` values are immutable.

In [39]:
a

'abcdefghijklm'

Finding the position of a substring in a string is also easy, using the `find` method. If the string doesn't contain the substring, -1 will be returned.

In [40]:
a.find('cde')

2

In [41]:
a.find('edc')

-1

When reading data from a text file, one often has to remove whitespace from string, either at the start or the end of a string, or both. `str` has methods to conveniently do just that: `lstrip`, `rstrip`, and `strip`.

In [42]:
a = '  some text  '
print(f"'{a.lstrip()}'")
print(f"'{a.rstrip()}'")
print(f"'{a.strip()}'")

'some text  '
'  some text'
'some text'


Here, an f-string was used to properly format the output, ignore this for now, we discuss this shortly in some detail.

`split` is another `str` method that is very useful. As its name implies, it can be used to split strings. If it is called without any arguments, it will split on whitespace, otherwise it will split on the argument string.  It returns a list of strings.

In [43]:
a = 'alpha  0.3     9'
print(a.split())

['alpha', '0.3', '9']


Below, the data in the string is separated by semi-colon, so we call `split` accordingly.

In [44]:
a = 'alpha; 0.3; 9'
print(a.split(';'))

['alpha', ' 0.3', ' 9']


However, some care is required.  If we specifically use the space character to split, the result will be different from what we might expect.

In [1]:
a = 'alpha  0.3     9'
print(a.split())
print(a.split(' '))

['alpha', '0.3', '9']
['alpha', '', '0.3', '', '', '', '', '9']


A split is done on each and every space character, so we will end up with a number of empty strings.

Sort of the opposite of `split` is `join`, i.e., you have a list of strings, and want to create a single string out of that by concatination. The original strings should be separated by some string, e.g., a semicolon.

In [45]:
separator = ';'
lot_of_strings = ['alpha', 'beta', 'gamma']
print(separator.join(lot_of_strings))

alpha;beta;gamma


It is somewhat counter-intuitive at first glance that the `join` method is actually invoked on the sperator string, rather than on the list. There is a very good reason for this, think about it.

Often when producing strings that are textual representations of integers or floating point numbers, we need to format them appropriately.  By way of example, we will produce a table with the integers from 0 to 9, and their square root.

In [46]:
from math import sqrt
for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:
    print(i, sqrt(i))

0 0.0
1 1.0
2 1.4142135623730951
3 1.7320508075688772
4 2.0
5 2.23606797749979
6 2.449489742783178
7 2.6457513110645907
8 2.8284271247461903
9 3.0


However, I'd prefer to separate the integer from the floating point value by a semicolon, and all floating point values should only be displayed with 4 decimal digits.  This is where f-strings (formatted strings, Python 3.6+) come in very handy. 

In [47]:
from math import sqrt
for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]:
    print(f'{i:d};{sqrt(i):.4f}')

0;0.0000
1;1.0000
2;1.4142
3;1.7321
4;2.0000
5;2.2361
6;2.4495
7;2.6458
8;2.8284
9;3.0000


Let's break that down. The f-string is delimited by `f''` (or `f""` if you prefer). Anything that is enclosed by braces `{...}` in such a string will be replaced, so take `{i:d}`. It will be replaced by the current value of the variable `i`, while the `:d` specifies that the type of `i` is integer, so it should be converted to a string first. The next character is the semicolon, that will just be a literal part of the string. Next, we have an other pair of braces, `{sqrt(i):.4f}`, so that will be replaced. The function call `sqrt(i)` will be executed, the return value is of type `float`. Hence the formatting string `.4f`, which will convert that floating point value to a string, with 4 digits after the decimal dot.

f-string have many formatting options, so it is a good idea to look at the documentation to get an overview of all its magic. Another example might be formatting a date.

In [48]:
day = 3
month = 10
year = 2018
sep = '-'
print(f"{day:02d}{sep}{month:02d}{sep}{year:d}")

03-10-2018


Two new features were introduced here. The formatting specification `02d` for values of type `int` implies that two digits should be shown, and if the integer only has a single one, as is the case for `day`, than it should be left-padded with `0`.  The other new feature is the straightforward substitution of a variable that has a string value, `sep`. The default format within braces is for `str`, so no formatting specification is required. However, we could have writting `{sep:s}`, since `s` is the format specifier for strings.

Note that the book uses the original Python string formatting syntax. Although you still see this frequently in existing code, I would not recommend this style for new code, and will use f-strings throughout the course.

#### Your turn now: find substrings

Consider the string `'AGGTCAAGGTAGTCCAGGTAGGTCA'` that represents a DNA fragment. Use `find` to search for the substring `'AGGT'`. Note that `'AGGT'` occurs four times in the DNA string. What will be the result?

1. 1
1. 0
1. [0, 6, 15, 19]
1. [1, 7, 16, 20]

Check the [documentation of the method `rfind`](https://docs.python.org/3.6/library/stdtypes.html#string-methods), and use it to find the position of the last occurrence of `'AGGT'` in the DNA string by replacing the `____` by the appropriate method call.

In [None]:
dna = 'AGGTCAAGGTAGTCCAGGTCA'
dna.___

In the following code fragment, replace the `____` so that the code fragment will display all indices in the DNA string where the substring `'AGGT'` occurs. You will need to consult the [documentation for the `find` method](https://docs.python.org/3.6/library/stdtypes.html#string-methods).

In [None]:
dna = 'AGGTCAAGGTAGTCCAGGTAGGTCA'
subdna = 'AGGT'
pos = dna.find(____)
while pos >= 0:
    print(pos)
    pos = dna.find(____)

#### Your turn now: formatting

Consider how to format time, we would like a string of the form `HH:MM:SS`, e.g., `07:23:05`. In the following code snippet, replace `____` by a format string that would produce the desired output for all reasonable values of `hours`, `minutes`, and `seconds`. You can assume that `hours` will be between 0 and 23, and that `minutes` and `seconds` will be between 0 and 59.

In [None]:
hours = 5
minutes = 23
seconds = 5
print(____)

#### Your turn now: string methods

In the string `'  AGGAGTAA  ;GGATACCT  ;  TCCATTAC '`, the semicolon separates three DNA fragments. Replace `____` in the following code fragment so that for each of those fragments only the first four nucleotides are shown, followed by an ellipsis (i.e., `...`). For example, the first DNA fragment would be printed as `AGGA...`.

In [None]:
dna_strings = '  AGGAGTAA  ;GGATACCT  ;  TCCATTAC '
for dna_string in dna_strings.____:
    dna_string = dna_string.____
    print(f'{dna_string[____]}...')

## Collection data types

Python has a number of collective data types built in that are the bread and butter of everyday programming: `tuple`, `list`, `set`, and `dict`. Several more are avaible as modules in the Python standard library, but we will not discuss these here.

Collection types, as the name implies, are data structures that can store multiple values.

### `list`

A list represents a sequence of values that need not be unique, e.g., a sequence of species you have been studying the DNA of.

In [49]:
species = ['home sapiens', 'felis catus', 'homo sapiens', 'danio rerio']

The list `species` contains 4 elements, the first and third are `'homo sapiens'`, the second is `'felis catus'`, and the last is `'danio rerio'`.  In this case, all four elements have the same type, `str`. Although that is quite common, it is not necessarily the case, lists can contain elements of various types.

For strings, we could use the function `len` to determine the number of characters in a string. This same function does double duty, it will also determine the number of elements in a list.

In [50]:
len(species)

4

The indexing mechanism that we used to get characters out of a string work for elements of lists, so to get the second element, `'felis catus'`, we use index 1.

In [51]:
species[1]

'felis catus'

The last element can be addressed using index -1.

In [52]:
species[-1]

'danio rerio'

Selecting the second and third element can be done by slicing.

In [53]:
species[1:3]

['felis catus', 'homo sapiens']

Mote that the result of slice is of type `list`.

In [54]:
type(species[1:3])

list

A list can be empty, it has no elements, its length is zero.

In [55]:
done = list()
len(done)

0

To check whether a value occurs as an element of a list, we can use the `in` operator, e.g.,

In [56]:
'danio rerio' in species

True

In [57]:
'rerio' in species

False

In [58]:
['homo sapiens', 'felis catus'] in species

False

Note the diffence of the semantics for the `in` operator when used with `str` and `list`. For `str`, the `in` operator tests for the occurrence of a substring in a string, for `list` it tests membership, so whether a value occurs in a list.

To find the index of the first element equal to a given element, we can use the `index` method.

In [59]:
species.index('homo sapiens')

2

Again, this is very similar to the `find` method defined for `str` (in fact, `index` is also defined for `str`), however, there is a caveat.

In [60]:
species.index('sapiens')

ValueError: 'sapiens' is not in list

Rather than returning -1 when the element can not be found, the `index` method throws an exception, i.e., it generates an error.  Although this can be dealt with elegantly, for now we just first test whether the element occurs in the list, and if so, call `index`.

In [61]:
if 'sapiens' in species:
    position = species.index('sapiens')
    print(f'found at {position}')
else:
    print('element not found')

element not found


As opposed to string, lists are mutable, so we can assign new values to element, e.g., replacing the second element of the list.

In [62]:
species[1] = 'canis familiaris'
species

['home sapiens', 'canis familiaris', 'homo sapiens', 'danio rerio']

Currently, the list has four elements, but we can more at the end if we like using the `append` method.

In [63]:
species.append('canis lupus')
species

['home sapiens',
 'canis familiaris',
 'homo sapiens',
 'danio rerio',
 'canis lupus']

A `list` can also be modified by removing an element at a given index. A convenient method for this is `pop`, since it also returns the element that was removed. Let's remove the dog at index 1.

In [64]:
removed = species.pop(1)
print(removed, species)

canis familiaris ['home sapiens', 'homo sapiens', 'danio rerio', 'canis lupus']


A list can be sorted using the function with that name, it will return a new list inthe natural order for the elements. For strings, that is lexicographic.

In [65]:
sorted(species)

['canis lupus', 'danio rerio', 'home sapiens', 'homo sapiens']

#### Your turn now: insert an element

Another way to add elements to a list is by using the `insert` method, read the [documentation of `insert`](https://docs.python.org/3.6/library/stdtypes.html#lists), and apply it to insert another cat `'felis catus'` as the third element in the list. The resulting list should be:

`['home sapiens', 'homo sapiens', 'felis catus', 'danio rerio', 'canis lupus']`

Jupyter notebooks give give you immediate access to a brief description of what a furnction or method does, and the arguments that you should use. Compare the output of the `help` command with the online documentation you just consulted.

In [None]:
help(list.insert)

Is the list above a good test to assess your understanding of the `insert` method?  Why not?

Why does it makes sense to insert the new element _before_ the element at the given index?

#### Your turn now: reversing a list

One way, though probably not the most efficient one, of creating a copy of a list would be done by the following code.

In [28]:
original = [3, 5, 7, 9]
copy = []
for element in original:
    copy.append(element)
copy

[3, 5, 7, 9]

How would you complete the following code fragment to create a copy of the original list, but with the elements in the reversed order, i.e., so that `copy` has `[9, 7, 5, 3]` as value?  Replace `____` with an appropriate method call.

In [None]:
original = [3, 5, 7, 9]
copy = []
for element in original:
    copy.____
copy

There are indeed more convenient ways to copy and reverse a list than the code fragments above, look for them in the [documentation on `list`](https://docs.python.org/3.6/library/stdtypes.html#lists).

#### Your turn now: sort order

Check Wikipedia for the definition of lexicographic versus alphabetic order for strings. Based on what you find, which result of calling `sorted` on `['C', 't', 'T', 'A', 'g']` would you expect, and why?

1. `['A', 'C', 'g', 't', 'T']`
1. `['A', 'C', 'T', 'g', 't']`

### `tuple`

Tuples are in many ways similar to lists, but they are, just like string, immutable. Once created, they can not be modified.

By way of examples, consider a tuple that has two elements, the first is the longitude, the second the lattitude of a location an observation was made.

In [23]:
location = (5.34, 54.17)

We get the longitude at index 0, similar as for lists.

In [24]:
location[0]

5.34

However, we can not change the tuple.

In [None]:
location[1] = 55.17

Consequently, a `tuple` has no methods such as `append` or `insert`.

#### Your turn now: `tuple` versus `list`

Why would anyone use a `tuple` when a `list` can do so much more?

### `set`

As opposed to a `list`, elements of  a `set` are unique, i.e., each element occurs exactly once. Another difference is that in a `list`, we have a first, a second, a last element, so a `list` is ordered. Elements in a `set` are not ordered at all, so a phrase like "the first element of a set" is entirely meaningless.

The mathematical notion of a set maps directly to the semantics of the `set` type in Python.

Let's start with an empty set, and add some animals in a list, and print the `set` each time to see how it changes.

In [4]:
animals = set()
for animal in ['cat', 'fish', 'bird', 'cat', 'bird', 'dog']:
    animals.add(animal)
    print(f'adding {animal}: {animals}')

adding cat: {'cat'}
adding fish: {'fish', 'cat'}
adding bird: {'fish', 'bird', 'cat'}
adding cat: {'fish', 'bird', 'cat'}
adding bird: {'fish', 'bird', 'cat'}
adding dog: {'fish', 'bird', 'cat', 'dog'}


Although both `'cat'` and `'bird'` where added twice, they occur only once in the `set`, exactly what we would expect from set theory in mathematics.

You can already guess how to get the number of elements in set, and how to test membership, it is indeed similar to how it works for `list`.

In [5]:
len(animals)

4

In [6]:
'bird' in animals

True

In [7]:
'wolf' not in animals

True

Removing an element from a set can be done using the `discard` method, which doesn't do anything if the value we try to remove is not a member of the set.

In [9]:
animals.discard('fish')
animals

{'bird', 'cat', 'dog'}

In [10]:
animals.discard('wolf')
animals

{'bird', 'cat', 'dog'}

All the set operations you would expect are defined as methods, e.g., union, intersection, set difference, and so on. If you are curious, check the [documentation](https://docs.python.org/3.6/library/stdtypes.html#set-types-set-frozenset).

Many types of data can be stored in a `set`, e.g., `int`, `float`, `str`, `tuple`, but not `list`, `set`, and `dict`. Technically, types that can be stored in a `set` have to be _hashable_.

#### Your turn now: popping from sets

The `pop` method is also defined on `set`, but its semantics differs from that for `list`. Check the [documentation for `pop` on `set`](https://docs.python.org/3.6/library/stdtypes.html#set-types-set-frozenset). Why can't you `pop` from a set using an index?

### `dict`

The last collective type we will discuss here is a dictinonary. As in a real-world dictionary, the information is stored as key/value pairs. You look up a word in a dictionary (the key), and you read its definition (the value). The keys in a dictionary can be any type that can be stored in a `set`, so all hashable types. The values can have any type.

As an example, take a dictionary that maps nucleotides (keys) to the number of times they occur in a DNA string `'AGGTCCCAATGA'`. The `A` occurs 4 times, the `C` 3, the `G` 3, and the `T` 2 times. The keys in the dictionary `counts` are of type `str`, the values of type `int`. For the sake of example, we leave out the value for `T`.

In [3]:
counts = {
    'A': 4,
    'C': 3,
    'G': 3,
}

Getting a value for a given key is very similar to accessing list elements. Instead of using the index to access and element, we use the key.

In [2]:
counts['C']

3

In [4]:
counts['T']

KeyError: 'T'

When we try to use a key that is not in the dictionary, we get a `KeyError`. We can test whether a `dict` has a given key using the `in` operator.

In [10]:
'T' in counts, 'A' in counts

(True, False)

Adding a key/value pair to a `dict` is done by assignment, again very similar to `list`.

In [6]:
counts['T'] = 2

Iterating over the keys of a `dict` is straightforward, but note that the order is undefined. The keys in a `dict` can be thought of as a set, and hence are unordered. (Note: this is about to change.)

In [7]:
for nucleotide in counts:
    print(f'{nucleotide} occurs {counts[nucleotide]} times')

A occurs 4 times
C occurs 3 times
G occurs 3 times
T occurs 2 times


To remove a key/value pair from a `dict`, the `pop` mehtod can be used.  It will remove the key and its value, and return the latter.

In [8]:
counts.pop('A')

4

The size of the `dict`, i.e., the number of key/value pairs it contains, can be retrieved using the `len` function.

In [9]:
len(counts)

3

#### Your turn now: counting nucleotides

Above, we created the `counts` dictionary by hand. Replace the `____` in the following code fragment so that it is computed.

In [None]:
dna = 'AGGTCCCAATGA'
counts = ____
for nucleotide in dna:
    counts[____] = ____
for nucleotide in counts:
    print(f'{____} occurs {____} times')

In the second line, you'd like an empty dictionary, if you think of how you got an empty `list` or `set`, what is your guess for `dict`?  If necessary, check the [documentation](https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries).

#### Your turn now: `dict` and mathematics

We already remarked on the similarity between a set in mathematics, and the `set` type in Python. There is a mathematical concept that closely corresponds to a `dict` in Python, what is it?

### Comprehenions

Python has the concept of `list`, `set`, and `dict` comprehensions, i.e., creating a new data structure out of and existing one. Again the analogy with mathematics is quite strong. We will restrict ourselves to `list` comprehensions here.

Consider the following `list` of DNA fragments.

In [1]:
dna = ['AAGAATATCC', 'CGAAATGG', 'CGG']

We can conveniently construct a new `list` that contains integers, representing the length of each fragment in `dna`.

In [2]:
dna_lengths = [len(fragment) for fragment in dna]
dna_lengths

[10, 8, 3]

In the code above, the function `len` is applied to each element in the `list` `dna`, and the result is add to a new `list`, that we assign to the variable `dna_lengths`.

Comprehensions are used a lot in Python, and, when used judiciously, can lead to code that is easier to understand. However, when overused, complicated comprehensions will obfuscate your intentions.

#### Your turn now: capitalize

Given a `list` of DNA fragments `dna_orig`, replace `____` in the following code to create a new list that has all-uppercase symbols.

In [None]:
dna_orig = ['agTCCaA', 'AAtgc', 'TTACT']
dna = ____
for fragment in ____:
    dna.____

Rewrite the code above to use a `list` comprehension to obtain the same result.