# Lesson 3

Jurre Hageman

## Other data types

First we will cover some other datatypes that are not collections.  
These include:
- None
- Booleans

## None

The `None` keyword is used to define... nothing, a null value, no value. It is used as a kind of flag for nothing. It is definitely **not** the same as 0, an empty string, an empty list, etc. 

Imagine asking someone how many people have signed up for an event by putting their name on a list. 0 would mean that no people have signed up. It means that you *do* know something: you know that there are no people on the list.
`None`, on the other hand, means that there is no value. No one counted the people on the list, or maybe the list got lost and nobody can even count them. It, in a way, indicates the absence of information.

So why is `None` useful? Imagine you are analyzing data from a digital weather station device. You are recording the temperature in degrees Celsius. The device did not boot yet. Now how do you store the temperature? There are multiple options you might think would work:
- 0 might seem as the easiest solution, but has an obvious problem: the device could also measure a temperature of 0. It is not possible after saving the values whether a 0 was saved because it was relatively cold that day, or because the sensor did not work.
- A very large number is not much better. Yes, you should never measure 9999 degrees celcius, but computers don't know. You always have to perform extra checks yourself, which always results in extra confusion and (unnoticed) errors.
- An empty string could never be misstaken for a number, but also this has problems. A string (as we've already seen) does not (usually) behave like an integer, which could lead to weird errors which are hard to interpret.

You can see that `None` comes in very handy here. It is clear that there is no temperature received yet.

In [1]:
temp = None
print(temp)
temp = 20.4
print(temp)

None
20.4


## Boolean

Booleans are used very often in programming.  
There are two types:  
    - `True`
    - `False`
Any expression in Python will return True or False.  
Some examples:

In [2]:
x = 4
y = 5
print(x > y)
print(x < y)

False
True


`True` has the same value as 1 in Python while `False` has the same value as 0:

In [3]:
print(True == 1) # == evaluates for equal
print(False == 0)
print(True == 2)
print(True == 0)
print(False == 1)

True
True
False
False
False


There is a bit more to know about booleans. It is important to understand how data collections are evaluated using the `bool` function. More on that at a later lesson. 

## Collections: strings

We already covered some basics about strings. Strings are collections of Unicode characters. They are iterables.
An example:

In [4]:
dna = "GAATCTGG"

## String indexing

- String positions can be specified using an index
- When working form the start, indexes start at 0
- When working from the end, indexes start at -1
![indexing](figs/fig1.png)

In [5]:
dna = "GATC"
print(dna[0])
print(dna[2])
print(dna[-1])

G
T
C


What happens if you try to index a position that does not exist?  
Try it!

In [6]:
#print(dna[4])

## Manipulating strings

> Important: in Python, strings are immutable. That means that after manipulating a string, you will get a new string object. You can not change a string in place. This will result in a TypeError.

In [7]:
dna = "GATC"
#dna[0] = "A" # Results in an TypeError

In order to change a string, you will need to make a new string and overwrite the variable:

In [8]:
dna = "GATC"
print(dna)
dna = "AATC"
print(dna)

GATC
AATC


Nevertheless, you can do some operations on strings. But remember that Python will always return a new string as strings are immutable. 

In [9]:
print(dna)
print(dna + dna)
print(dna * 10)

AATC
AATCAATC
AATCAATCAATCAATCAATCAATCAATCAATCAATCAATC


### String slicing

You can use slicing to create a substring.
- seq[x:y:z] will return a slice starting at, and including, index x (or 0 if omitted) to, but excluding, index y (or the end if omitted), and only taking the character at each step z
- A slice will always return the same data type. This is different from indexing. Although indexing strings will always return a string, indexing lists (see below) will not per definition return a list.
Some examples:

In [10]:
seq = "GATC"
print(seq[1:]) # from pos 1 to end (and including end)
print(seq[:3]) # from pos 0 to 3 (not including 3)
print(seq[::2]) # start to end (including end) with step 2
print(seq[::-1]) # reverse a string

ATC
GAT
GT
CTAG


## The `len` function

The `len` function works on all collections. It returns the number of items in the collection.

In [11]:
seq = "GATC"
print(len(seq))

4


## Lists

- Lists are … lists (of objects)
- They are also frequently called arrays (in other languages)
- They contain ordered series of objects: characters, strings, numbers, or any other Python object
- They are quite similar to strings in many ways but with one important exception: they are mutable
- Lists are created using square brackets, with elements separated by commas
- You can also use indexing and slicing on lists. 

In [12]:
my_empty_list = []
sequences = ['atc', 'agg', 'ttt']
print(sequences[1])
print(sequences[1:]) # slicing
print(sequences[1:2]) # slicing. 

agg
['agg', 'ttt']
['agg']


> Note that a slice always returns the same data type. Slicing:  
> - Slicing (`print(sequences[1:2])`) will return a list with a single element. 
> - Indexing (`sequences[1]`) will return the list object (a string in this case).

## List manipulations

You can append to an existing list:

In [13]:
sequences = ['atc', 'agg', 'ttt']
print(sequences)
sequences.append("ggg")
print(sequences)

['atc', 'agg', 'ttt']
['atc', 'agg', 'ttt', 'ggg']


Or concatenate an excisting list with another list:

In [14]:
sequences = ['atc', 'agg', 'ttt']
more_sequences = ['ccc', 'ggg']
total_sequences = sequences + more_sequences
print(total_sequences)

['atc', 'agg', 'ttt', 'ccc', 'ggg']


Change the content:

In [15]:
total_sequences = ['atc', 'agg', 'ttt', 'ccc', 'ggg']
print(total_sequences)
total_sequences = ["aaa"] + total_sequences # use concatenation to prepend
print(total_sequences)
total_sequences[1] = 'ata' # replace item
print(total_sequences)
total_sequences.remove('ccc') # remove by value (first occurance)
print(total_sequences)
del total_sequences[1] # remove by index
print(total_sequences)

['atc', 'agg', 'ttt', 'ccc', 'ggg']
['aaa', 'atc', 'agg', 'ttt', 'ccc', 'ggg']
['aaa', 'ata', 'agg', 'ttt', 'ccc', 'ggg']
['aaa', 'ata', 'agg', 'ttt', 'ggg']
['aaa', 'agg', 'ttt', 'ggg']


You can also delete or insert multiple items:

In [16]:
my_list = ["atc", "atg"]
my_list[1:1] = ["ccc", "ggg"] # insert multiple items at index
print(my_list)
my_list[1:3] = ["aaa", "ttt"] # change multiple items
print(my_list)
my_list[1:3] = [] # delete multiple items
print(my_list)

['atc', 'ccc', 'ggg', 'atg']
['atc', 'aaa', 'ttt', 'atg']
['atc', 'atg']


Making a copy of a list often leads to confusion among new Python programmers.  
The following code will **not** copy items from a list:

In [17]:
list1 = [1, 2, 3]
list2 = list1 # not a copy. list1 and list2 now point to the same object in memory
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) # list1 points to the same object

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4]


If you want to copy items from a list you can use a slice:

In [18]:
list1 = [1, 2, 3]
list2 = list1[:] # slices all items from list1 to a new list object
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) 

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]


Or you can use the `list.copy` method:

In [19]:
list1 = [1, 2, 3]
list2 = list1.copy()
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) 

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]


> The `list.copy` method will make a shallow copy. We will not cover deep copy of lists in this course but there are libraries for doing this (the deepcopy function in the copy module).

## Some useful functions:

Like the `len` function, `min`, `max` and `sum` work on all collections. They are very useful for lists: 

In [20]:
x = [2, 4, 7, 9]
print(len(x))
print(min(x))
print(max(x))
print(max(x))

4
2
9
9


Unlike other languages, there is no type coersion in Python. Therefore, the `sum` function will raise an error if a list contains integers and strings:

In [21]:
x = [2, "a", 7, 9]
#print(sum(x)) #will result in a TypeError

While the former functions work on every collection, there are also some useful list methods. You can get an overview using `dir(list)`. We will cover some of them (`count` and `index`):

In [22]:
my_list = [1, 4, 2, 3, 3, 4, 4]
print(my_list.count(4))
print(my_list.index(4)) # index of first occurance

3
1


## Nested lists

You can also create lists within lists: nested lists or matrices (also called multi-dimensional lists)

In [1]:
my_table = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# to show that it can represent a table:

my_table = [[1, 2, 3], 
            [4, 5, 6], 
            [7, 8, 9]]

print(my_table)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


Some examples to index:

In [5]:
print(my_table[0]) # first row
print(my_table[1]) # second row
print(my_table[1][0]) # first element of the second row

[1, 2, 3]
[4, 5, 6]
4


## Tuples

`Tuples` are like lists but they are immutable. Hence, they contain less methods compared to lists (like update, remove etc.). Tuples are created using `()` instead of `[]`.  
An example:

In [23]:
base = ("A", "T", "C", "G")

If you want a one element tuple you need a comma:

In [24]:
result1 = ("Hello")
print(type(result1))
result2 = ("Hello",)
print(type(result2))

<class 'str'>
<class 'tuple'>


Many people starting Python programming have a bit of trouble with understanding why tuples excist when lists can do the same and even more. It all has to do with the immutable nature of tuples. A few use cases:
- Keeping data integrity. For example, there are only 20 different amino acids in humans, no more no less. Storing them in a tuple is a good idea. If you work with many people on a project, you will be sure that nobody will be able to change it. If you write data to a database, it is also a good idea to wrap the data in a tuple to ensure data integrity. 
- Tuples can be used as keys in dictionaries (see below for information). Lists can not be used as keys due to their mutable nature.


## Dictionaries

Dictionaries are an importent collection in Python. Each item consists of a key, value pair. Thus, there is a coupling between the key-value pairs. Each key must be unique. An example (for bio-informaticians):

In [25]:
bases = {'a': 'adenine', 't': 'thymine', 'c': 'cytosine','g': 'guanine'} #compact notation

# also allowed:

bases = {'a': 'adenine', 
         't': 'thymine', 
         'c': 'cytosine',
         'g': 'guanine'}

print(bases)

{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


You can get the value of a key using indexing:

In [26]:
print(bases['a'])

adenine


If you use a key that does not exist, you will get an error:

In [27]:
#print(bases['q']) # will result in a KeyError

You can avoid this error using the `dict.get` method: 

In [28]:
print(bases.get('a'))
print(bases.get('q'))

adenine
None


You can remove an item using the `del` keyword:

In [29]:
del bases['a']
print(bases)

{'t': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


And you can add an item as follows:

In [30]:
bases['a'] = 'adenine'
print(bases)

{'t': 'thymine', 'c': 'cytosine', 'g': 'guanine', 'a': 'adenine'}


If you want to add multiple items you can use the update method:

In [31]:
bases = {'a': 'adenine', 't': 'thymine'}
print(bases)
bases.update({'c': 'cytosine', 'g': 'guanine'})
print(bases)

{'a': 'adenine', 't': 'thymine'}
{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}


You might have wondered if you could work the other way around, retrieve a key from a value. The answer is no, not possible...
You need to swap the keys and values:

In [32]:
print(bases)
bases_switched = {} # create empty dict
for i in bases:
    new_key = bases[i]
    new_value = i
    bases_switched[new_key] = new_value
print(bases_switched)

{'a': 'adenine', 't': 'thymine', 'c': 'cytosine', 'g': 'guanine'}
{'adenine': 'a', 'thymine': 't', 'cytosine': 'c', 'guanine': 'g'}


There are shorter ways to accomplish this. You will learn that in informatics 2.

> Python dictionaries are ordered. You might still read that they are not ordered but this changed recently.

## Sets

A set is like a dictionary without values. Thus, it is a set of unique keys. Sets are unordered. An example:

In [33]:
bases_dna = {"a", "t", "c", "g"}
bases_rna = {"a", "u", "c", "g"}

You can use set to check if all bases of DNA are present:

In [34]:
dna = "atttgggaatg"
print(set(dna))

{'a', 't', 'g'}


You can use set methods to get the union, intersection and differences. Have a look at the Venn diagram that shows some possible logical relations between different sets (source: https://en.wikipedia.org/wiki/Venn_diagram):
![Venn](./figs/fig2.png)

In [35]:
print(bases_dna.intersection(bases_rna)) # returns the set of elements that are present in both sets
print(bases_dna.union(bases_rna)) # returns the set of elements present in either set
#dir(bases_dna)
print(bases_dna.difference(bases_rna)) # returns the set of elements present in bases_dna, but not bases_rna
print(bases_rna.difference(bases_dna)) # returns the set of elements present in bases_rna, but not bases_dna

print(bases_dna.symmetric_difference(bases_rna)) # returns the set of elements present in either set, but not in both

{'a', 'g', 'c'}
{'u', 'g', 'a', 'c', 't'}
{'t'}


## Creating empty collections

Often, you want to create an empty collection. For most collections, this can be achieved by either regular syntax or using the constructor function.  
Some examples:

In [36]:
my_string1 = "" #regular syntax
my_string2 = str() #constructor function
my_list1 = [] #regular syntax
my_list2 = list() #constructor function
my_tuple1 = () #regular syntax
my_tuple2 = tuple() #constructor function
my_dict1 = {} #regular syntax
my_dict2 = dict() #constructor function
my_set = set() #constructor function

> Note that an empty set can only be created using the constructor function. This is because the {} are already used to create an empty dictionary. For all other collections, you can use both but regular syntax is preferred over constructor by convention. 

## Python data structures conversions

A lot of data structures can be converted. For example, an integer can be converted to a float, a list to a string, a dictionary to a set etc.   
Some examples:

In [37]:
x = 1
print(x)
print(float(x))

1
1.0


In [38]:
x = 1
print(x)
print(str(x))

1
1


In [39]:
x = 1
print(x)
print(bool(x)) 
y = 0
print(y)
print(bool(y)) #we will cover a whole lesson to this

1
True
0
False


In [40]:
x = ["hello", "world"]
print(x)
print(" ".join(x)) #This might be a bit confusing at first glance. Converts a list in a string using the str.join method.

['hello', 'world']
hello world


In [41]:
x = [1, 2, 3]
print(x)
print(tuple(x))

[1, 2, 3]
(1, 2, 3)


In [42]:
x = {'a': 't', 'c': 'g', 't': 'a', 'g': 'c'}
print(x)
print(set(x)) #remember that sets are unordered unlike dictionaries

{'a': 't', 'c': 'g', 't': 'a', 'g': 'c'}
{'a', 't', 'g', 'c'}


The end