# Lists

The variable types that we have used up until now are very limiting - the variables have only stored single pieces of information (a number or a piece of text). To be able to handle complex biological datasets, we'll need to be able to store many numbers or strings (such as DNA sequences) within a single variable.

An extremely common and useful data type in Python is the list. Lists are composed of individual 'elements' (sometimes also referred to as items), and those elements can be of many different types - int, float, string, or even lists or dictionaries (a data type we'll cover later in the course). Lists can contain mixtures of any of these data types, and are created as lists of elements separated by commas and contained within a set of square brackets:

In [1]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte", "fibroblast"]

subject_heights = [1.83, 1.57, 1.74, 1.61, 1.86, 1.75, 1.67] 

mixed_datatypes = ["Kathmandu", 5021, 11.823, "Brian", 11]

### Concatenating lists

Lists can be joined together through concatenation using the + operator:

In [1]:
subject_heights = [1.83, 1.57, 1.74] + [1.61, 1.86, 1.78, 1.75] 
print(subject_heights)

[1.83, 1.57, 1.74, 1.61, 1.86, 1.78, 1.75]


In [1]:
domestic_animals = ["dog", "cat"]
farm_animals = ["cow", "sheep", "goat"]
animals = farm_animals + domestic_animals
print(animals)

['cow', 'sheep', 'goat', 'dog', 'cat']


### Accessing Values in Lists

Lists are an ordered data type - each item occurs at a position known as the index. Just as we were able to access individual characters in a string in the last section, the values stored in individual list elements can be recalled directly using the index for that element. The first element of the list is given the index 0, the second element is at index 1, and so on. For convenience, the last element in the list can be accessed at index -1 (you should experiment to see what is at index -2). 

In [1]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte", "monocyte", "fibroblast"]

print(cell_types[0]) #prints "hepatocyte"

print(cell_types[2]) #prints "erythrocyte"

print(cell_types[-1]) #prints "fibroblast"

hepatocyte
erythrocyte
fibroblast


Multiple items can also be recalled at once by specifying a range of indices. Ranges are non-inclusive of the end number, so a range of 0:2 returns the elements at index 0 and 1, but not 2.

In [3]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte", "monocyte", "fibroblast"]
print(cell_types[0:2]) 

['hepatocyte', 'adipocyte']


#### Extended slices

Just as for strings, extended slices allow us to skip elements in a list, or reverse it.

In [7]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte", "monocyte", "fibroblast"]
print(cell_types[0:5:2]) #prints every second list element
print(cell_types[::-1]) #reverses the list order

['hepatocyte', 'erythrocyte', 'fibroblast']
['fibroblast', 'monocyte', 'erythrocyte', 'adipocyte', 'hepatocyte']


### Adding Elements to Lists

You can add additional elements to a list using the `extend`, `append` or `insert` methods.

**append**: Appends an object at the end of a list. 

***Important Note:*** If an appended object is a list, the list will be inserted as a single element containing a list, the list items are not added (concatenated) to the original list (for this use `extend`). This is demonstrated in the second `append` example below.

In [6]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte"]
cell_types.append("monocyte")
print(cell_types)

['hepatocyte', 'adipocyte', 'erythrocyte', 'monocyte']


In [4]:
#adding a list object to a list
cell_types = ["hepatocyte", "adipocyte", "erythrocyte"]
cell_types.append(["monocyte" , "fibroblast"])
print(cell_types)

['hepatocyte', 'adipocyte', 'erythrocyte', ['monocyte', 'fibroblast']]


**extend**: Extends a list by appending elements from the iterable (e.g. list, dictionary, tuple or even a string)

In [8]:
tree_girths = [10.1, 20.2, 17.8]
tree_girths.extend([22.8, 15.5])
print(tree_girths)

[10.1, 20.2, 17.8, 22.8, 15.5]


**insert**: Inserts an element into a list at a specified position

In [9]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte"]
#insert a string at index 2 (i.e. the third position)
cell_types.insert(2, "monocyte") 
print(cell_types)

['hepatocyte', 'adipocyte', 'monocyte', 'erythrocyte']


#### Overwrite Elements
It is also possible to edit the elements of the list using the same indexing notatation, assigning a value just as you would a variable.

In [1]:
cell_types = ["hepatocyte", "adipocyte", "erythrocyte"]
# Change the second cell type
cell_types[1] = "monocyte"
print(cell_types)

['hepatocyte', 'monocyte', 'erythrocyte']


**Note that the cell type "adipocyte" has been removed from the list, and "monocyte" has been put in its place**

### Removing values from lists

There are a number of ways to remove items from a list in Python:

**del**: removes an element(s) from a list at the specified index


In [10]:
cell_types = ["hepatocyte" , "adipocyte" , "erythrocyte"]
#delete element at index 1 (i.e. the second position)
del cell_types[1]
print(cell_types)

['hepatocyte', 'erythrocyte']


**remove**: removes the first matching value from a list

In [11]:
cell_types = [ "hepatocyte" , "adipocyte" , "erythrocyte" ]
#remove the first occurrence of "hepatocyte" 
cell_types.remove("hepatocyte")
print(cell_types)

['adipocyte', 'erythrocyte']


**pop**: removes an element at specified index from a list and return its value. Useful if you want to remove elements from a list as you process them. 

In [12]:
cell_types = [ "hepatocyte" , "adipocyte" , "erythrocyte" ]
removed_value = cell_types.pop(0)
print(removed_value)

hepatocyte


### Sorting lists

Lists can be sorted using either the `sorted()` function or the `sort` method. The difference between the two options is that `sorted()` will return a new sorted list, leaving the original list intact, whereas `sort` will modify a list by sorting it in-place. 

In [13]:
#using .sort()
subject_heights = [ 1.83 , 1.57 , 1.74 , 1.61 , 1.86 , 1.78 , 1.75 , 1.67 ] 
subject_heights.sort()
print(subject_heights)

[1.57, 1.61, 1.67, 1.74, 1.75, 1.78, 1.83, 1.86]


In [2]:
#using sorted()
subject_heights = [ 1.83 , 1.57 , 1.74 , 1.61 , 1.86 , 1.78 , 1.75 , 1.67 ] 
ascending_heights = sorted(subject_heights)
print(ascending_heights)
print(subject_heights) #original list in not modified / sorted

[1.57, 1.61, 1.67, 1.74, 1.75, 1.78, 1.83, 1.86]
[1.83, 1.57, 1.74, 1.61, 1.86, 1.78, 1.75, 1.67]


Elements can also be sorted in decending order by specifying the reverse=True flag when calling the `sorted()` function:

In [15]:
subject_heights = [ 1.83 , 1.57 , 1.74 , 1.61 , 1.86 , 1.78 , 1.75 , 1.67 ] 
descending_heights = sorted(subject_heights, reverse=True)
print(descending_heights)

[1.86, 1.83, 1.78, 1.75, 1.74, 1.67, 1.61, 1.57]


#### Additional List Functions and methods

Just as for strings, there are a variety of functions and methods that we can use on lists.

|Function/Method| Description |
|-------|-------|
|len(list)| Returns the number of elements in the list.|
|max(list)| Returns element from the list with highest value.|
|min(list)| Returns element from the list with lowest value.|
|list.reverse()| Reverses objects of list in place|
|sum(list)| Returns the sum of all numeric values in a list|


In [12]:
subject_heights = [ 1.83 , 1.57 , 1.74 , 1.61 , 1.86 , 1.78 , 1.75 , 1.67 ] 
print(len(subject_heights)) #prints number of elements in the list (8)
print(max(subject_heights)) #prints max value in list (1.86)
print(min(subject_heights)) #prints minimum value in list (1.57)
subject_heights.reverse()   #list is now reversed
print(subject_heights)      #prints the now reversed list
print(sum(subject_heights)) #prints sum of values in list (13.81)

8
1.86
1.57
[1.67, 1.75, 1.78, 1.86, 1.61, 1.74, 1.57, 1.83]
13.81


# Exercises

* Add together the following two lists of oligos and print the list in alphabetical order.

In [16]:
oligos_fwd = ["ATGCTGA", "GATACAT", "CATGACTG"]
oligos_rev = ["GATTCAT", "TACATCA", "TACCAGTA"]

* Sort the following list in order then print the middle value in the list (this list has an uneven number of elements, so the middle value will be the median of values).

Additional advanced exercise: can you use conditional statements to modify your code so that it also correctly calculates the median for an even number of samples?

In [1]:
eggs_laid = [10,11,14,16,13,14,12,10,16,11,12,13,16,11,10,14,16,16,15,12,11,13,14,15,12,14,12,15,13,13,11]



As part of earthworm awareness week, students from the University of Edinburgh sampled the number of earthworms at different sites. The number of earthworms in 0.5 cubic metres soil sample were entered in a list as the students collected the data:

  * How many samples were taken in total? Hint: You should use the function 'len()'
  * How many worms were counted in the 20th sample (remember that lists are zero-indexed) and in the last sample?.
  * What were the minimum and maximum number of worms found in a single sample?
  * What is the mean and median number of worms found per 0.5 cubic metre soil sample? Hint: sum() returns the sum of all values in the list.

In [2]:
number_of_worms = [24, 12, 16, 18, 21, 3, 8, 17, 32, 29, 13, 15, 30, 19, 5, 7, 18, 7, 19, 17, 25, 27, 19, 28, 14, 16, 29, 18, 4, 10, 23, 8, 18, 15, 16, 8]

