# Day 1: Sequences
Just dealing with single values as we did before will not get us very far. We will therefore move on to Python's built-in *sequential data types* that are capable of storing multiple values. The main sequential data types in Python are **tuples**, **lists**, and **sets** and it should also be noted that strings can behave like sequential data types in some aspects (because a string essentially also stores multiple characters).

## Lists and Tuples
### Overview
Lists are among the most useful and frequently used data types in Python. A list can store any kind of objects, even other sequential data types, and list elements can be easily accessed and manipulated. We define lists with the bracket `[]` operator or the `list()` function.

In [25]:
lst1 = [1,2,3]
lst2 = ["A", "B", 3, 4.5]
lst3 = ["a list of lists", lst1, lst2]

# list() function
lst4 = list(("A", "B", 3, 4))

# string -> list conversion
lst5 = list("ATTGCTT")

# a useful trick for the print() function is to use its sep argument with the newline character "\n"
# to print each object to a new line
print(lst1, lst2, lst3, lst4, lst5, sep = "\n")

[1, 2, 3]
['A', 'B', 3, 4.5]
['a list of lists', [1, 2, 3], ['A', 'B', 3, 4.5]]
['A', 'B', 3, 4]
['A', 'T', 'T', 'G', 'C', 'T', 'T']


As we've seen above, we can easily split a string into a list, but we can also convert a list into a string with the `.join()` function with the syntax {separator}`.join(`list`)`

In [47]:
# convert string into list
sequence = "ATGGTTTACTG"
lst = list(sequence)
print(lst)

# convert list into string with custom separator
str1 = "".join(lst)
str2 = "; ".join(lst)

print(str1, str2, sep = "\n")

# this also works but is usually not what we want
str3 = str(lst)
print(str3)

['A', 'T', 'G', 'G', 'T', 'T', 'T', 'A', 'C', 'T', 'G']
ATGGTTTACTG
A; T; G; G; T; T; T; A; C; T; G
['A', 'T', 'G', 'G', 'T', 'T', 'T', 'A', 'C', 'T', 'G']


Tuples are very similar to lists, with the only difference that we can't change them after their creation - they are **immutable**, while lists are **mutable** (we will see examples for that soon). Under the hood his makes data storage of tuples more efficient and code involving tuples is a little faster. It is therefore considered good practice to always consider whether you really need a list or could also use a tuple, but we will not care for this for now, as those differences only matter for very large data frames or complex programs.

We define tuples with comma-separated values where we usually put round brackets `()` around them for clarity, although they are technically not always needed. Alternatively, we can use the `tuple()` function.

In [24]:
# this is usually how you define a tuple
tup1 = (1, 2, "A", 3.5)
tup2 = tuple((3, 1, ["A", "B"]))

# this also works
tup3 = 1, 2, "A", 3.5

# we can easily convert lists or strings to tuples
lst = [1, 2, 3]
tup4 = tuple(lst)
tup5 = tuple("ATTGCTT")

print(tup1, tup2, tup3, tup4, tup5, sep = "\n")

(1, 2, 'A', 3.5)
(3, 1, ['A', 'B'])
(1, 2, 'A', 3.5)
(1, 2, 3)
('A', 'T', 'T', 'G', 'C', 'T', 'T')


### Indexing
When dealing with lists or tuples we often want to access specific items. For this, we can use the index operator `[]` to either select single values or subsets of the list (= *slicing*):

* `[index]`: get element at specified index (the first position is defined as 0)
* `[start:stop]`: get all elements between position start and stop (stop itself is not included)
* `[start:stop:step]`: get all elements between position start and stop with a specific step size

Some useful tricks:

* You can also index from the end by putting a `-` in front of the index. The last element would be -1, the second last -2 and so on.
* If start or stop are not specified, they will be set to the first index, or last index + 1, respectively (see examples)

In [42]:
alphabet = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
print(alphabet)

# first element
print(alphabet[0])

# last element
print(alphabet[-1])

# all elements up to the 13th
print(alphabet[0:13]) # or...
print(alphabet[:13])

# only each second value
print(alphabet[::2])

# all elements except the last 2
print(alphabet[:-2])


['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
A
Z
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']
['A', 'C', 'E', 'G', 'I', 'K', 'M', 'O', 'Q', 'S', 'U', 'W', 'Y']
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X']


Note: We can also use index operators with strings:

In [43]:
sequence = "ATGCTGGGTTCT"
print(sequence[:3])

ATG


### Sets
Sets are a special type of data that only contain **unique** and **immutable** objects and reflect the Python implementation of mathematical sets (German: "Mengen"). Sets are not technically sequential data types, because they are **unordered**. In lists and tuples the order of contained items matters and you can access them at different positions. For sets this is not possible and their applications are usually finding unique values in lists/tuples or overlaps between those values.

We will see differences and applications of all those data types in the next section. For not it is just important to note that we define sets with curly brackets `{}` and that they always just contain unique values.

In [33]:
# set definition
set1 = {1, 3, 2}
set2 = set(("A", "C", "B", 2, 3, 1))

# sets do not keep order information
print(set1, set2, sep = "\n")

{1, 2, 3}
{1, 2, 3, 'A', 'B', 'C'}


In [31]:
# sets can only contain immutable objects, so lists are not allowed
set3 = {1, 2, 3, [3, 4]}

TypeError: unhashable type: 'list'

In [34]:
# instead of lists, we can use a tuple
set3 = {1, 2, 3, (3, 4)}
print(set3)

{1, 2, 3, (3, 4)}


In [30]:
# sets do not contain duplicates
set4 = {1, 1, 1, 1, 2, 2, 3}

# we can easily convert other sequences into sets
lst = ["A", "A", 1, 2, 2]
set5 = set(lst)
set6 = set("ATTGCTT")

print(set4, set5, set6, sep = "\n")

{1, 2, 3}
{1, 2, 'A'}
{'T', 'G', 'A', 'C'}


## Acessing or modifying items
### Indexing
When dealing with sequential data tyoes such as strings, lists, or tuples, we often want to access specific elements.