## Indexing en slicing
We have seen multiple datatypes that are collections, such as the string, list and tuple. There are many methods available that will allow you to do work on the collections, such as the `string.upper()` method to create an uppercase copy of the string or the `list.append()` method to add elements to the end of a list. But what if we want to retrieve or change certain elements in a collection?

To be able to get a portion of a collection we can use **indexing** and **slicing**. Both will be explained in this Notebook. First, we have to specify what indexing means. To do this imagine that every element in a collection has a number associated with it, starting from the **first element** which get the number 0 going to the end of the collection, each time incrementing (add 1) the number. The number is called the *index*, and it is this number that we can use to retrieve or change the element.

Collections are not only numbered from left to right (positively), but can also start from the right going to the left. In that case, we start at the rightmost element using index **-1** and decrement (subtracting 1) every number going to the left.

To illustrate this, have a look at the figure below which depicts a string with the letters: GAATCTGG When working from the start the indexes start at 0. When working from the end, indexes start at -1. When using indexing on collections we place the index we want to use between **square brackets `[]`**.
![string indexing](figs/indexing.png)

In the code example below we demonstrate (both positive and negative) indexing on a string, but this also works on lists and tuples.

In [1]:
dna = 'GATC'

print(dna[0]) # print the first element
print(dna[2]) # print the third element
print(dna[-1]) # prints the last element

G
T
C


An error that you will probably see many times, is the one that will be shown if you try to access elements of a collection which are not there. Indexing outside the possible indexes, will throw an **index out of range error**, as is shown in the code example below

In [2]:
print(dna[4]) # there is not a 4 index

IndexError: string index out of range

Important: in Python, strings are immutable. That means that after manipulating a string, you will get a new string object. You can not change a string in place. This will result in a TypeError.

Important: in Python Tuples are immutable. Meaning that after a tuple has been created it cannot be changed anymore and trying to do so will result in a TypeError.

In [3]:
dna = "GATC"
dna[0] = "A" # Results in an TypeError

TypeError: 'str' object does not support item assignment

In [5]:
codons = ('atg', 'taa', 'ttt')
codons[1] = 'ggg'

TypeError: 'tuple' object does not support item assignment

In [None]:
dna = "GATC"
print(dna)

dna = "AATC"
print(dna)

## slicing
Another technique that also uses indexing, however instead of accessing a single element allows you to access multiple elements is called **slicing**. This technique uses the same notation style as indexing using the square brackets. However, instead of a single index as parameter we can now specify a start, stop and step parameter inside the brackets using the following notation:
`collection[start:stop:step]`

`seq[x:y:z]` will return a slice starting at, and including, index x (or 0 if omitted) to, but excluding, index y (or the end if omitted), and only taking the character at each step z. A slice will always return the same data type. This is different from indexing. Although indexing strings will always return a string, indexing lists (see below) will not per definition return a list. Some examples:

In [4]:
seq = "GATC"
print(seq[1:]) # from pos 1 to end (and including end)
print(seq[:3]) # from pos 0 to 3 (not including 3)
print(seq[::2]) # start to end (including end) with step 2
print(seq[::-1]) # reverse a string

ATC
GAT
GT
CTAG


In [6]:
sequences = ['atc', 'agg', 'ttt']
print(sequences[1]) # indexing, returning a string
print(sequences[1:]) # slicing, returning a list
print(sequences[1:2]) # slicing, returning a list

agg
['agg', 'ttt']
['agg']


Note that a slice always returns the same data type.

**Slicing** (`print(sequences[1:2])`) will return a list with a single element.
**Indexing** (`sequences[1]`) will return the list object (a string in this case).

In [None]:
total_sequences = ['atc', 'agg', 'ttt', 'ccc', 'ggg']
print(total_sequences)

total_sequences[1] = 'ata' # replace item
print(total_sequences)

del total_sequences[1] # remove by index
print(total_sequences)

You can also delete or insert multiple items:

In [None]:
my_list = ["atc", "atg"]
my_list[1:1] = ["ccc", "ggg"] # insert multiple items at index
print(my_list)
my_list[1:3] = ["aaa", "ttt"] # change multiple items
print(my_list)
my_list[1:3] = [] # delete multiple items
print(my_list)

Remember the multidimensional arrays (matrices) we introduced in the list_en_tuples Notebook? There, we did not show how you could acces elements from nested lists. In the next example we will revisit the lists of lists and show you how you can use **chaining** (multiple operations on the same objects in a single statement) to acces elements from a nested list.

In [18]:
# a table:
my_table = [[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]]

print(my_table)

print(my_table[0]) # first row using indexing
print(my_table[1]) # second row using indexing
print(my_table[1][0]) # first element of the second row. This adds a second round of indexing after the first (this is called chaining)
print(my_table[1][1:3]) # ofcourse, we could also use slicing to acces multiple elements

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[1, 2, 3]
[4, 5, 6]
4
[5, 6]


In [20]:
# why doesn't the following work?
print(my_table[0:2][2])

# try to print the first part of the chaining (`my_table[0:2]`) and see what is being returned

IndexError: list index out of range