# Day 1: Lists and Tuples
Just dealing with single values as we did before will not get us very far. We will therefore move on to Python's built-in *sequential data types* that are capable of storing as many values as you want in one single variable. The main sequential data types in Python are **tuples** and **lists** and it should also be noted that strings can behave like sequential data types in some aspects (because a string essentially also stores multiple single characters).

## Creating lists and tuples
Lists are among the most useful and frequently used data types in Python. A list can store any kind of objects, even other sequential data types, and list elements can be easily accessed and manipulated. We define lists with the bracket `[]` operator or the `list()` function.

In [25]:
list_1 = [1, 2, 3]
list_2 = ["A", 3, 4.5, True]
list_3 = ["a list of lists", list_1, list_2]

# list() function
list_4 = list(("A", "B", 3, 4))

# string -> list conversion
list_5 = list("ATTGCTT")

print(list_1, list_2, list_3, list_4, list_5, sep="\n")

[1, 2, 3]
['A', 3, 4.5, True]
['a list of lists', [1, 2, 3], ['A', 3, 4.5, True]]
['A', 'B', 3, 4]
['A', 'T', 'T', 'G', 'C', 'T', 'T']


As we've seen above, we can easily split a string into a list, but we can also convert a list into a string with the `.join()` function with the syntax {separator}`.join(`list`)`. This is particularly useful when dealing with DNA or amino acid sequences because we usually want to print them as a string but edit them as a list.

In [15]:
# convert string into list
sequence = "ATGGTTTACTG"
lst = list(sequence)
print(lst)

# convert list into string with custom separator
str_1 = "".join(lst)
str_2 = "; ".join(lst)

print(str_1, str_2, sep = "\n")

# this also works but is usually not what we want when trying to convert a list
# into a string
str_3 = str(lst)
print(str_3)

['A', 'T', 'G', 'G', 'T', 'T', 'T', 'A', 'C', 'T', 'G']
ATGGTTTACTG
A; T; G; G; T; T; T; A; C; T; G
['A', 'T', 'G', 'G', 'T', 'T', 'T', 'A', 'C', 'T', 'G']


Tuples are very similar to lists, with the only difference that we can't change them after they are created - they are **immutable**, while lists are **mutable** (we will see examples for that soon). Under the hood his makes data storage of tuples more efficient and code involving tuples is a little faster. Those differences only start to matter for very large data frames or complex programs, so we will just use lists for everything in this course.

We define tuples with comma-separated values where we usually put round brackets `()` around them for clarity, although they are technically not always needed. Alternatively, we can use the `tuple()` function.

In [16]:
# this is usually how you define a tuple
tup_1 = (False, 2, "A", 3.5)
tup_2 = tuple((3, 1, ["A", "B"]))

# this also works
tup_3 = False, 2, "A", 3.5

# we can easily convert lists or strings to tuples
lst = [1, 2, 3]
tup_4 = tuple(lst)
tup_5 = tuple("ATTGCTT")

print(tup_1, tup_2, tup_3, tup_4, tup_5, sep = "\n")

(False, 2, 'A', 3.5)
(3, 1, ['A', 'B'])
(False, 2, 'A', 3.5)
(1, 2, 3)
('A', 'T', 'T', 'G', 'C', 'T', 'T')


## Indexing
When dealing with lists or tuples we often want to access specific items. For this, we can use the index operator `[]` to either select single values or subsets of the list (= *slicing*):

* `[index]`: get element at specified index (the first position is defined as 0)
* `[start:stop]`: get all elements between position start and stop (stop itself is not included)
* `[start:stop:step]`: get all elements between position start and stop with a specific step size

Some useful tricks:

* You can also index from the end by putting a `-` in front of the index. The last element would be -1, the second last -2 and so on.
* If start or stop are not specified, they will be set to the first index, or last index + 1, respectively (see examples)

In [17]:
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(lst)

# first element
print(lst[0])

# last element
print(lst[-1])

# all elements up to the 5th
print(lst[0:5]) # or...
print(lst[:5])

# all elements from the 6th on
print(lst[5:10]) # or...
print(lst[5:])

# only each second element
print(lst[::2])

# all elements except the last 2
print(lst[:-2])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1
10
[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[6, 7, 8, 9, 10]
[1, 3, 5, 7, 9]
[1, 2, 3, 4, 5, 6, 7, 8]


We can also use index operators with strings:

In [18]:
sequence = "ATGCTGGGTTCT"
print(sequence[:3])

ATG


You can use the index operator to access values but also to **modify** values in the list by reassigning them to the list at the specified positions. Keep in mind that when you assign multiple positions at once you need to put the new elements into a list as well.

In [28]:
mylist = [1, 2, 2, 4, 5, 7, 8]

# assign a single element
mylist[2] = 3
print(mylist)

# assign multiple elements
mylist[-2:] = [6, 7]
print(mylist)

[1, 2, 3, 4, 5, 7, 8]
[1, 2, 3, 4, 5, 6, 7]


### Specific methods and functions
Python contains many functions dedicated to working with lists. You usually use them in the form `mylist.function()` and if the function requires an argument to be specified you just write it between the parentheses. The most interesting ones for us are listed below:

**Adding/removing values:**
* `.append()`: add an element to the end of the list
* `+` or `.extend()`: add the elements of a list to another list
* list `*` n: replicate the list n times
* `.insert()`: inserts an element at the specified index
* `.pop()`: removes an element at the specified index
* `.remove()`: remove the first occurence of an item

**Finding elements:**
* `.index()`: return the index of the first occurence of an item
* `.count()`: return the number of times an item occurs in the list
* item `in` list: returns `True` if an item is in the list, `False` if it is not

**Other useful functions**
* `.copy()`: returns a copy of the list
* `len()`: returns the number of elements in the list (= length)

Now, a few examples how this is implemented in code:

In [22]:
## Adding/removing values:
sequence = list("AA")

# add the elements ["G", "G"], ["T", "T"], and ["C", "C"] using
# three different methods
sequence.append("G")
sequence.append("G")
sequence.extend(["T", "T"]) # alternative: sequence.extend(list("TT"))
sequence = sequence + ["C", "C"]

print(sequence)
print("".join(sequence))

# replicate the list 3 times
sequence = sequence * 3
print(sequence)
print("".join(sequence))

print("-"*20)

#--------------------------

sequence = list("A"*6)
print(sequence)
print("".join(sequence))

# insert a G at the second position (remember: indexes start at 0 in Python)
sequence.insert(1, "G")
print(sequence)
print("".join(sequence))

# remove the inserted G from the sequence
sequence.pop(1)
print(sequence)
print("".join(sequence))

# change nucleotides 2-4 to Gs
sequence[1:4] = ["G", "G", "G"]
print(sequence)
print("".join(sequence))

['A', 'A', 'G', 'G', 'T', 'T', 'C', 'C']
AAGGTTCC
['A', 'A', 'G', 'G', 'T', 'T', 'C', 'C', 'A', 'A', 'G', 'G', 'T', 'T', 'C', 'C', 'A', 'A', 'G', 'G', 'T', 'T', 'C', 'C']
AAGGTTCCAAGGTTCCAAGGTTCC
--------------------
['A', 'A', 'A', 'A', 'A', 'A']
AAAAAA
['A', 'G', 'A', 'A', 'A', 'A', 'A']
AGAAAAA
['A', 'A', 'A', 'A', 'A', 'A']
AAAAAA
['A', 'G', 'G', 'G', 'A', 'A']
AGGGAA


In [20]:
## Finding elements
sequence = list("TTCGTTTCTGCTGATGCTT")

# position of the first C
print(sequence.index("C"))

# how many Ts are in the sequence?
print(sequence.count("T"))

# do we have an A in the sequence?
print("A" in sequence)

# tip: this also works for strings
"A" in "TCGTTTA"

2
10
True


True

In [23]:
## Other useful functions
sequence = list("TCGAGGGTCGATCGGGTCGAAGCT")

# how long is our sequence?
print(len(sequence))

# copying the sequence and modifying it without changing the original
sequence2 = sequence.copy()
sequence2.extend(list("TTTAAAGGG"))

print("".join(sequence), "".join(sequence2), sep = "\n")

24
TCGAGGGTCGATCGGGTCGAAGCT
TCGAGGGTCGATCGGGTCGAAGCTTTTAAAGGG
