# Lesson 3

Jurre Hageman

## Other data types

First we will cover some other datatypes that are not collections these include:
- None
- Booleans

## None

The `None` keyword is used to define... nothing, a null value, no value. It is used as a kind of flag for nothing. So why is this usefull? Imagine you are analyzing data from a digital weather station device. You are recording the temperature in degrees Celsius. The device did not boot yet. Now how do you store the temperature? 0 is obviously wrong as temperature is an example of interval scale. There is no true zero for temperature in Celsius. A temperature of 20°C is not twice as warm as 10°C. Hence, it is not a ratio scale but an inteval scale (20°C is 10°C warmer compared to a temperature of 10°C). You could set it to an empty string but that would be clumpsy too. `None` comes in very handy here. It is clear that there is no temperature received yet.

In [5]:
temp = None
print(temp)
temp = 20.4
print(temp)

None
20.4


## Boolean

Booleans are used very often in programming.
There are two types:
    - `True`
    - `False`
Any expression in Python will return True or False.  
Some examples:

In [6]:
x = 4
y = 5
print(x > y)
print(x < y)

False
True


`True` has the same value as 1 in Python while `False` has the same value as 0:

In [8]:
print(True == 1) # == evaluates for equal
print(False == 0)
print(True == 2)
print(True == 0)
print(False == 1)

True
True
False
False
False


The is a bit more to know about booleans. It is important to understand how data collections are evaluated using the `bool` function. More on that later. 

## Collections: strings

In [None]:
We already covered some basics about strings. Strings are collections of Unicode characters. They are iterables.
An example:

In [None]:
dna = "GAATCTGG"

## String indexing

- String positions can be specified using an index
- When working form the start, indexes start at 0
- When working from the end, indexes start at -1
![indexing](figs/fig1.png)

In [9]:
dna = "GATC"
print(dna[0])
print(dna[2])
print(dna[-1])

G
T
C


What happens if you try to index a position that does not exist?

In [16]:
dna[12]

IndexError: string index out of range

## Manipulating strings

> Important: in Python, strings are immutable. That means that after manipulating a string, you will get a new string object. You can not change a string in place. This will result in a TypeError.

In [10]:
dna = "GATC"
dna[0] = "A"

TypeError: 'str' object does not support item assignment

The change it you need to make a new string and overwrite the variable:

In [11]:
dna = "GATC"
dna = "AATC"
print(dna)

AATC


Nevertheless, you can do some operations on strings. But remember that Python will always return a new string as strings are immutable. 

In [13]:
print(dna + dna)
print(dna * 10)

AATCAATC
AATCAATCAATCAATCAATCAATCAATCAATCAATCAATC


### String slicing

You can use slicing to create a substring.
- seq x:y:z ] will return a slice starting at, and including, index x (or 0 if omitted) to, but excluding, index y (or the end if omitted), and only taking the character at each step z
- A slice will always return the same data type. This is different from indexing. Although indexing strings will always return a string, indexing lists (see below) will not per definition return a list.
Some examples:

In [15]:
seq = "GATC"
print(seq[1:]) # from pos 1 toend (and including end)
print(seq[:3]) # from pos 0 to 3 (not including 3)
print(seq[::2]) # start to end (including end) with step 2
print(seq[::-1]) # reverse a string

ATC
GAT
GT
CTAG


## The `len` function

The `len` function works on all collections. It returns the number of items in the collection.

In [18]:
seq = "GATC"
print(len(seq))

4


## Lists

- Lists are … lists (of objects)
- They are also frequently called arrays (in other languages)
- They contain ordered series of objects: characters, strings, numbers, or any other Python object
- They are quite similar to strings in many ways but with one important exception: they are mutable
- Lists are created using square brackets, with elements separated by commas
- You can also use indexing and slicing on lists. 

In [21]:
my_empty_list = []
sequences = ['atc', 'agg', 'ttt']
print(sequences[1])
print(sequences[1:]) # slicing
print(sequences[1:2]) # slicing. Note that a slice always returns the same data type. A list with a single element here.

agg
['agg', 'ttt']
['agg']


## List manipulations

You can append to an existing list:

In [23]:
sequences = ['atc', 'agg', 'ttt']
print(sequences)
sequences.append("ggg")
print(sequences)

['atc', 'agg', 'ttt']
['atc', 'agg', 'ttt', 'ggg']


Or concatenate an excisting list:

In [24]:
sequences = ['atc', 'agg', 'ttt']
more_sequences = ['ccc', 'ggg']
total_sequences = sequences + more_sequences
print(total_sequences)

['atc', 'agg', 'ttt', 'ccc', 'ggg']


Change the content:

In [31]:
total_sequences = ['atc', 'agg', 'ttt', 'ccc', 'ggg']
total_sequences = ["aaa"] + total_sequences # use concatenation to prepend
print(total_sequences)
total_sequences[1] = 'ata' # replace item
print(total_sequences)
total_sequences.remove('ccc') # remove by value (first occurance)
print(total_sequences)
del total_sequences[1] # remove by index
print(total_sequences)

['aaa', 'atc', 'agg', 'ttt', 'ccc', 'ggg']
['aaa', 'ata', 'agg', 'ttt', 'ccc', 'ggg']
['aaa', 'ata', 'agg', 'ttt', 'ggg']
['aaa', 'agg', 'ttt', 'ggg']


You can also delete or insert multiple items:

In [35]:
my_list = ["atc", "atg"]
my_list[1:1] = ["ccc", "ggg"] # insert multiple items at index
print(my_list)
my_list[1:3] = ["aaa", "ttt"] # change multiple items
print(my_list)
my_list[1:3] = [] # delete multiple items
print(my_list)


['atc', 'ccc', 'ggg', 'atg']
['atc', 'aaa', 'ttt', 'atg']
['atc', 'atg']


Making a copy of a list often leads to confusion among new Python programmers.  
The following code will **not** copy items from a list:

In [1]:
list1 = [1, 2, 3]
list2 = list1 # not a copy. list1 and list2 now point to the same object in memory
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) # list1 points to the same object

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4]


If you want to copy items from a list you can use a slice:

In [2]:
list1 = [1, 2, 3]
list2 = list1[:] # slices all items from list1 to a new list object
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) 

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]


Or you can use the `list.copy` method:

In [3]:
list1 = [1, 2, 3]
list2 = list1.copy()
print(list1)
print(list2)
list2.append(4) # add 4 to the end
print(list2)
print(list1) 

[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3]


> The `list.copy` method will make a shallow copy. We will not cover deep copy of lists here but there are libraries for doing this (the deepcopy function in the copy module).

## Some usefull functions:

Like the `len` function, `min`, `max` and `sum` work on all collections. They are very usefull for lists: 

In [4]:
x = [2, 4, 7, 9]
print(len(x))
print(min(x))
print(max(x))
print(max(x))

4
2
9
22


Unlike other languages, there is no type coersion in Python:

In [6]:
x = [2, "a", 7, 9]
print(sum(x))

TypeError: unsupported operand type(s) for +: 'int' and 'str'

The end