<h1 id="toctitle">Lists and loops</h1>
<ul id="toc"/>

Storing multiple values is currently awkward as we saw in the exercise from the previous session:

In [1]:
# three string variables
header1 = "ABC123"
header2 = "DEF456"
header3 = "HIJ789"

##Lists
###Defining a list and getting elements
Python's solution is to have lists of data, which can store multiple values. Lists start and end with square brackets and the __elements__ are separated by commas:

In [2]:
headers = ["ABC123","DEF456","HIJ789"]

Once we have stored some values in a list, we can get a single element by giving its __index__:

In [3]:
headers[1]

'DEF456'

We can also go the other way: find the index of an element:

In [4]:
headers.index("HIJ789")

2

Indexes work just like they did for strings:

In [6]:
headers[1:3]

['DEF456', 'HIJ789']

Start counting from zero, inclusive at the start, exclusive at the end. 

###Building up a list

Often we want to start with an empty list and build it up:

In [7]:
headers = []

and build it up. `append()` adds a value on to the end of the list:

In [10]:
headers = []
print(headers)
headers.append("ABC123")
print(headers)

[]
['ABC123']


We can also concatenate two lists:

In [11]:
headers + ["DEF456","HIJ789"]

['ABC123', 'DEF456', 'HIJ789']

note that this doesn't change the original list:

In [12]:
headers

['ABC123']

If we try to concatenate a value and list, we will get an error:

In [13]:
headers + "DEF456"

TypeError: can only concatenate list (not "str") to list

###Creating a list by splitting a string

The `split()` method of strings returns a list:

In [14]:
sentence = "one two three four"
sentence.split(" ")

['one', 'two', 'three', 'four']

In [15]:
sentence.split("t")

['one ', 'wo ', 'hree four']

This doesn't change the string:

In [34]:
sentence

'one two three four'

Another way to create a list is by using the `range()` function. `range()` can generate lists of numbers. 

With one argument it counts from zero to just before the number:

In [35]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

with two arguments it counts from the first number to just before the second:

In [36]:
range(5,12)

[5, 6, 7, 8, 9, 10, 11]

With three arguments, it uses the third argument as the step size:

In [37]:
range(3, 24, 4)

[3, 7, 11, 15, 19, 23]

##Tools for manipulating lists

###Sorting and reversing a list

Starting with our headers:

In [19]:
headers = ["DEF456","ABC123","HIJ789"]

Calling the `sort()` method changes the order to alphabetical:

In [20]:
headers.sort()
headers

['ABC123', 'DEF456', 'HIJ789']

or numerical:

In [22]:
numbers = [3,8,4,6,9,1,5,7,4,2,6,8,5,3]
numbers.sort()
numbers

[1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9]

We can reverse the order of a list as well:

In [23]:
headers = ["DEF456","ABC123","HIJ789"]
headers.reverse()
headers

['HIJ789', 'ABC123', 'DEF456']

##Loops

We know how to do something for one element of a list:

In [24]:
headers = ["DEF456","ABC123","HIJ789"]
one_header = headers[1]
print(one_header)

ABC123


How do we do something for every element of a list?

In [25]:
for header in headers:
    print(header)

DEF456
ABC123
HIJ789


`header` is a new variable that only exists inside the loop
The `for` line ends with a colon
The body of the loop is indented (tabs or spaces but not both!)
The value of header is set to each element in turn:

In [27]:
for header in headers:
    print("This time, the header variables is " + header)

This time, the header variables is DEF456
This time, the header variables is ABC123
This time, the header variables is HIJ789


This is a very simple example: in fact the loop body can contain any type of code we like. 

>For each header, create a new file with the same name as the header, then write the length of the header to the file, then print the first three letters of the header in lower case.

In [30]:
headers = ["DEF456","ABC123","HIJ789"]

# the loop body can contain anything we want
for header in headers:
    myfile = open(header + ".txt", "w")
    myfile.write(str(len(header)) + "\n")
    myfile.close()
    print(header[0:3].lower())

def
abc
hij


###Loops and strings

We can write a loop that uses a string as if it were a list - each character will behave like an individual element:

In [31]:
dna = "agcacgacgtagtgatcggcta"

# strings can pretend to be lists
# each element is a character
for base in dna:
    print(base)

a
g
c
a
c
g
a
c
g
t
a
g
t
g
a
t
c
g
g
c
t
a


It's easy to do this by accident so if you see individual characters, check you're looping over the right value. 

###Loops and files

We can treat a file object as if it were a list, each line is a list element:

In [33]:
my_file = open("test.txt")

# files can pretend to be lists
# each element is a line
for line in my_file:
    line_length = len(line.rstrip("\n"))
    print(line_length)

8
8
10


##Exercises

###Processing DNA in a file

The file _input.txt_ contains a number of DNA sequences, one per line. Each sequence starts with the same 14 base pair fragment – a sequencing adapter that should have been removed. 

Write a program that will (a) trim this adapter and write the cleaned
sequences to a new file and (b) print the length of each sequence to the screen.

###Multiple exons from genomic DNA

The file _genomic_dna2.txt_ contains a section of genomic DNA, and the file _exons.txt_ contains a list of start/stop positions of exons. Each exon is on a separate line and the start and stop positions are separated by a comma. 

Write a program that will extract the exon segments, concatenate them, and write them to a new file.

Hint: open these two files up in a text editor before you start coding so you can make sure you understand their format. 

###Bonus exercise: sliding windows

Write a program that will print a list of overlapping short segments from a long string (i.e. a sliding window approach). 

E.g. with input `abcdefg` and window size 4:

`abcd, bcde, cdef, defg`

Modify your program to print the AT content of each sliding window rather than the sequence. You can expand your list of segments to include the partial ones at the end.

In [2]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [1]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")