### Lists

Up until now we have used individual variables. One of the biggest benefits of having a computer is operating on lots and lots of data. This can be hard to organize with just variables, because you'd need a different variable for every data point.

For example, say a teacher wanted to calculate the average grade in their class. The following code would work, but takes a lot of effort:

In [21]:
student0 = 80
student1 = 93
student2 = 34
student3 = 68
student4 = 71
student5 = 95
student6 = 91
student7 = 87

average_grade = (student0 + student1 + student2 + student3 + student4 + student5 + student6 + student7) / 8
print(average_grade)

77.375


Every time we add or remove a student, we have to add/remove that student from line 10, and also update the total number of students. 

Python has a data structure called a **list** that makes it possible to group similar data together. Lists are defined using square brackets []. If we wanted to use a list for the above calculation, it would look something like this:

In [8]:
student_grades = [80,93,34,68,71,95,91,87]

average_grade = sum(student_grades)/len(student_grades)
print(average_grade)

77.375


A list can have any number of objects in it, and you can even add and remove objects whenever you want. You can even mix different types of objects in a list, but that can make them harder to work with. You can define every element in the list at the same time, or add them on the fly:

In [13]:
#adding items to a list when it is declared
my_list = [1,2,3,4,5]

#declaring an empty list and then adding to it
my_other_list = []
my_other_list.append(1)
my_other_list.append(2)
my_other_list.append(3)
my_other_list.append(4)
my_other_list.append(5)

print(my_list)
print(my_other_list)

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]


In [14]:
#add one more element to my_list
my_list.append(42)

print(my_list)
print(my_other_list)

[1, 2, 3, 4, 5, 42]
[1, 2, 3, 4, 5]


From the grade calculation, you can see that we can apply functions to lists for some pretty powerful results. `sum()` gives us the sum of every element in the list, and `len()` gives us the length of the list. You can also do more fancy things like reversing the list:

In [20]:
print(my_list)
my_list.reverse()
print(my_list)

[1, 2, 3, 4, 5, 42]
[42, 5, 4, 3, 2, 1]


If you use a list, you can still have access to the individual data inside it by a process called **indexing**. Every element in a list has an index, starting from zero and counting up. You can use these indices with square brackets, just like when you declare the list:

In [22]:
my_list = [4,3,6,8,2,3,1]
print(my_list[0])
print(my_list[1])
print(my_list[2])

4
3
6


You can also change entries in a list via their index:

In [2]:
my_list = [0,1,2,3,4,5]
print(my_list)
my_list[2] = 256
print(my_list)

[0, 1, 2, 3, 4, 5]
[0, 1, 256, 3, 4, 5]


There are also more fancy ways to index into a list, often called **slicing**. This operation allows you to take subsets of a list by providing the first and last indices you want to include:

In [24]:
my_list_subset = my_list[2:4]
print(my_list_subset)

[6, 8]


*Note*: the last index is not included in the list. In the above example even though we wrote `my_list[2:4]`, only indices 2 and 3 were returned.

You can also omit the index on either side of the colon to represent the beginning and end of the list:

In [26]:
my_list = [4,3,6,8,2,3,]
first_half = my_list[:3]
second_half = my_list[3:]
print(first_half)
print(second_half)

[4, 3, 6]
[8, 2, 3]


This is a good time to note that you can combine lists with the **+** operator, which will just add the elements from the second list to the end of the first:

In [28]:
full_list = first_half + second_half
print(first_half)
print(second_half)
print(full_list)

[4, 3, 6]
[8, 2, 3]
[4, 3, 6, 8, 2, 3]


If we add the two lists in the reverse order, we will get a different result:

In [34]:
full_list = second_half + first_half
print(full_list)

[8, 2, 3, 4, 3, 6]


You can use negative indices to count from the back of the list, if you'd like:

In [37]:
my_list = ['a','b','c','d','e','f']
print(my_list[-1])
print(my_list[-2])
print(my_list[-5:])

f
e
['b', 'c', 'd', 'e', 'f']


Try extracting the DNA sequence below from the noise around it: 

In [24]:
seq = ['$','#','7','3','T','G','G','A','C','2','&','w','0']

If you want to slice the list in different steps, you can specify that number after both colons. So a complete list slicing specification is list_name[start_index:stop_index:increment]. For example, if we wanted only the even indices from our list:

In [12]:
my_list = ["even", "odd", "even", "odd", "even", "odd", "even"]
print(my_list[::2])

['even', 'even', 'even', 'even']


`my_list[::2]` effectively means "start at the beginning of the list, stop at the end of the list, and move two items every time". If we wanted just the odd indices, we would need to start at index 1 instead of index 0:

In [13]:
print(my_list[1::2])

['odd', 'odd', 'odd']


You can also use this step size to reverse the list, by making it negative. `my_list[::-1]` will return the list in reverse order, `my_list[::-2]` will return every other element in the list in reverse order, and so on. This functionality can also be achieved through the `reverse()` function of a list, so you can do whatever feels comfortable.

In [17]:
l = ["zeroth", "first", "second", "third"]
print(l)
print(l[::-1])

print(l)
l.reverse()
print(l)

['zeroth', 'first', 'second', 'third']
['third', 'second', 'first', 'zeroth']
['zeroth', 'first', 'second', 'third']
['third', 'second', 'first', 'zeroth']


Notice that our slicing operation `l[::-1]` does **not** modify the original list, it just gives us a copy of the list with the desired properties. `reverse()` actually modified the elements in the list. If we wanted our slicing to actually update the list instead of just giving us a copy of it, we can reassign the list variable to the new copy of the list:

In [20]:
my_list = ["a","b","c","d","e","f"]
my_list = my_list[::-1]
print(my_list)

['f', 'e', 'd', 'c', 'b', 'a']


### Exercises

1. Update the list given below to place all of the items with even indices before all of the elements with odd indices. For example, the list `["e","o","e","o","e"]` would become `["e","e","e","o","o"]`

In [25]:
color_list = ["red", "blue","orange","indigo","yellow","violet","green"]

2.  Swap the first and last elements in the following list:

In [None]:
day_list = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]

3. Sort the following list of numbers.
(hint: we have not explicitly covered this, see if you can Google it to find out)

In [28]:
num_list = [234,342,678,443,578,5,23454567767555,909]

[5, 234, 342, 443, 578, 678, 909, 23454567767555]


### Strings

All of the above indexing from lists also works on strings, which you can think of as lists of characters. 

In [33]:
my_string = "this_is_a_long_string"
print(my_string[3])
print(my_string[5:9])
print(my_string[-6:])

s
is_a
string


Strings, like other objects, have functions associated with them that are different from the regular Python functions. You can think of these functions as "owned" by the string, and they describe what a string is capable of doing, or information you can get about a string. You can access a list of these functions by running `help(str)`, or by typing the name of a string variable followed by a period, and then pressing tab.

In [6]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

I'll add a few examples of things you can do with strings here, but this is by no means an exhaustive list. You can usually use Google to find what function you need. If I want to convert a string to uppercase, I would search "python string uppercase", and probably just look at the first link. 

In [11]:
s = "Hello"
print(s)
print(s.upper())
print(s.lower())
print(s.isdecimal())

print("---------")

s = "394"
print(s)
print(s.upper())
print(s.lower())
print(s.isdecimal())

Hello
HELLO
hello
False
---------
394
394
394
True


### Exercises

1. Determine the GC content of the following sequence:

In [29]:
seq = "ATGTATTGCTTCTAGACAC"

2. Determine the average GC content across all sequences in the list:

In [30]:
seq_list = ["ATGTATTGCTTCTAGACAC", "GTGGATAGCCGCGAAGTCTAGCTTCGATD", "ATGGTCTCTTGATTGCTGAAAGAGAAAAAA"]

3. The following list contains three small sequences that are formatted differently. Combine them all into a single, similarly formatted string called **my_seq** 

In [None]:
seq_list = ["agga", "gtC", "CAAAAT"]