In [None]:
# Lists from Files -- In Class Oct 3
# Shruti Rabara 
# Date

# What is a list? #

A list is a specific example of an object type in python called a sequence. Sequence types include strings, lists, tuples and xrange objects. They have index numbers associated with each position. We'll learn about xrange when we learn about generators. There are also two less commonly used types -- unicode strings and buffers. The things you can do with all sequences are add, multiply, slice, get len(), min() and max(), and test for membership with in/not in.

So far we know the following about lists.

- We define them by creating a collection of values inside square brackets, and assigning it to a variable (object) name
- We can make a list by using the list() function on another sequence (e.g. a string)
- We can get each item in a list individually by iterating over it with a for loop (for item in list:)
- We create an empty list by using empty square brackets list = []
- We need to create an empty list before we can append to it (say, inside a loop)
- We use .append() to append to a list
- We can count items in a list with .count() just like we do in a string
- We use the del() function to remove an item from a list

Let's review some of these things we know.

- In the cell below, define a list either by creating it manually or by converting a string.
- Iterate over the list with a for loop and print out each item.

In [2]:
my_String = "SHRUTI"
my_list = list(my_String)
for i in my_list:
    print(i)

S
H
R
U
T
I


- In the cell below, define an empty list.
- Make a for loop that creates items, e.g. numbers from a range, or random integers.
- Use the loop to append each item to the list.
- Print out the resulting list.

In [3]:
import random
emp_list = []
for i in range(0,3):
    i = random.randint(1,10)
    emp_list.append(i)
print(emp_list)

[9, 5, 3]


# List operators #

First of all, we can manipulate lists with operators.

- Lists can be added together with +
- Lists can be multiplied by an integer value with *
- We can check for membership with in and not in
- We can use the slice operator [start:stop:step] to select a range of indexes in the list.

In the cell below:

- add together alpha_list and smalpha_list
- multiply smalpha_list by 2
- do a membership check (with in or not in) on alpha_list that will return False
- slice the first 10 characters of alpha_list
- slice out every third character of smalpha_list

In [8]:
alpha_list = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
smalpha_list = list("abcdefghijklmnopqrstuvwxyz")
added = alpha_list + smalpha_list
multiplied = smalpha_list * 2
print(6 in alpha_list)
alpha_list[10:26:1]
smalpha_list[0:26:3]

False


['a', 'd', 'g', 'j', 'm', 'p', 's', 'v', 'y']

# More useful built-in functions for lists #

We already know we can get list length with len() just like for a string. Here are a few mathy functions that are useful with list:

We can use sum() to add together all the items in a list
We can use min() and max() to find the least and greatest numbers in a list

- In the cell below, use sum() to add together the numbers in the list
- Use min() to find the smallest number
- Use max() to find the largest number
- Find out what is returned when you use sum(), min() and max() on a non-numerical list (like alpha_list, or a list you make from the names of people at your table)

In [12]:
my_numbers = [1,3,5,7,9,10,21]
sum(my_numbers)
min(my_numbers)
max(my_numbers)
sum(alpha_list)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

# Creating sets from lists with set() #

Say you have parsed a file and made a list from one of the fields in the file. And because of the way the data is, that list has lots of repeating elements.

['Aiden','Aiden','Aiden','Maryam','Maryam','Brett','Brett','Brett','Brett']

Say you want to regroup your data by getting every line that is to do with Aiden, every line that is to do with Maryam, etc. But you don't want to repeat searching with their names multiple times.

You could create a set of unique items from that list using set(list). A set represents the unique items that exist in your list, but not how many times they exist.

- In the cell below, create a set from the list above, and then print it out. 
- Can you iterate over the set like you would over a list? Try it out in a for loop and see.

I use this all the time, and you might be able to find a way to use it in the lab for this week, too.

In [14]:
this_list = ['Aiden','Aiden','Aiden','Maryam','Maryam','Brett','Brett','Brett','Brett']
this_set = set(this_list)
for i in this_set:
    print(i)

Brett
Maryam
Aiden


# Creating a new data structure from 2 lists with zip #

Zip will pair up the items in two lists and create a "zip object". The zip object doesn't look like much until you decide what it's going to be.

```list1 = "ABCD"
list2 = "EFGH"
print(zip(list1,list2))```

outputs something like: <zip object at 0x10a2ffd20>

If I say I would like to make the zip object a list, like this:

```newobj = list(zip(list1,list2))```

I will get a list of tuples. Tuples are like lists in () and they are immutable -- you can't change individual elements.

```[('A', 'E'), ('B', 'F'), ('C', 'G'), ('D', 'H')]```

A zip object is a pointer to two things that python knows about which have now been paired up. Depending on what you are going to turn the zip object into, you can zip more than two lists together. If I do:

```list3 = list("IJKL")
newobj = list(zip(list1,list2,list3))```

Then I will get back:

```[('A', 'E', 'I'), ('B', 'F', 'J'), ('C', 'G', 'K'), ('D', 'H', 'L')]```

In the cell below, zip together the lists made from the alpha_list and the smalpha_list we created above.

In [18]:
this_obj = list(zip(alpha_list,smalpha_list))
print(this_obj)

[('A', 'a'), ('B', 'b'), ('C', 'c'), ('D', 'd'), ('E', 'e'), ('F', 'f'), ('G', 'g'), ('H', 'h'), ('I', 'i'), ('J', 'j'), ('K', 'k'), ('L', 'l'), ('M', 'm'), ('N', 'n'), ('O', 'o'), ('P', 'p'), ('Q', 'q'), ('R', 'r'), ('S', 's'), ('T', 't'), ('U', 'u'), ('V', 'v'), ('W', 'w'), ('X', 'x'), ('Y', 'y'), ('Z', 'z')]


# Creating a list of function results with map() #

If you have defined a function, and you have a list of objects you would like to call it on, you can use the map() function to create a new list. Run the cell below to see how this works.

In [16]:
def addfive(num):
    num += 5
    return num

numlist = [1,2,3,4,5]
list(map(addfive,numlist))

[6, 7, 8, 9, 10]

In the cell below, use a map command to make a list of gc percentages from this list of sequences.

In [19]:
def gcpercent(seq):
	gcpc = 100 * (seq.count("G") + seq.count("C")) / len(seq)
	return gcpc

listofseqs = ["AAAATTTTATAT", "GCCCGCCCGCGC", "ATATGCGCCGTA"]
list(map(gcpercent,listofseqs))

[0.0, 100.0, 50.0]

# Finding the index where a pattern occurs with .index() #

.index() finds the place in the list where an element occurs.

```alpha_list = list("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
alpha_list.index("B") #is equal to 1
alpha_list.index("Z") #is equal to 25```

Why is this useful? If you can find the element using .index(), then you can do something with that index specifically, like for instance get a corresponding item out of a parallel list.

If I can find "cow" in animals, and I have a second list called animal_weights that is parallel, I can also get the cow's weight.

```animals = ["monkey","dog","cat","cow","rabbit","horse"]
animal_weights = [15,50,9,1500,4,1100]
cow_weight = animal_weights[animals.index("cow")]
print(cow_weight)```

Or I could zip the two lists together and use them that way. Either thing works! I can guarantee that you'll need .index() to solve the lab problem today.

In the cell below, find the index of the element "rs916977". 

In [20]:
SNP_list = ["rs12139042","rs1667394","rs916977","rs12080175","rs205478","rs4542213"]
SNP_list.index("rs916977")

2

# Sorting lists with .sort() and reversing them with .reverse() #

These two methods actually change the list in place. In other words, if you apply .sorted() to a list, the list, which is mutable, is forever changed. Similarly, if you apply .reverse(), the order is changed.

animals = ["monkey","dog","cat","cow","rabbit","horse"]

In the cell below, apply .sort() and then .reverse() to animals. Print animals out between each step to see what happens.

In general, if we're using lists IN PARALLEL and their order is important to us, we don't want to sort or reverse them this way, because the parallelism will be lost forever. That's also why as we go on, we'll start using multi-dimensional data structures like matrices, dicts, and lists of tuples.

In [26]:
animals = ["monkey","dog","cat","cow","rabbit","horse"]
animals.sort()
print(animals)
animals.reverse()
print(animals)

['cat', 'cow', 'dog', 'horse', 'monkey', 'rabbit']
['rabbit', 'monkey', 'horse', 'dog', 'cow', 'cat']


# Using lists reversed or sorted without changing them #

We can also use the functions reversed() or sorted() when we access lists. These functions do not change the lists.

To see what I mean, start with the code in the cell below and see what it prints out. You'll see that the list is accessed in sorted order, but it's not changed. 

In the cells below, try the same with reversed(animals) and sorted(reversed(animals)).

In [27]:
animals = ["monkey","dog","cat","cow","rabbit","horse"]
for animal in sorted(animals):
    print(animal)
print(animals)

cat
cow
dog
horse
monkey
rabbit
['monkey', 'dog', 'cat', 'cow', 'rabbit', 'horse']


In [32]:
reversed(animals)
print(animals)
sorted(reversed(animals))
print(animals)

['monkey', 'dog', 'cat', 'cow', 'rabbit', 'horse']
['monkey', 'dog', 'cat', 'cow', 'rabbit', 'horse']


# Using a list comprehension #

A list comprehension is essentially a compressed for loop. Its components are [transformation, iteration, filter].

list2 = [i for i in list1] #base comprehension syntax -- returns a copy of list 1. Only the iteration is used.
list2 = [i+5 for i in list1] #has a transformation at the beginning. Each i in list 1 has 5 added to it.
list2 = [i+5 for i in list1 if i > 5] has a transformation, an iteration and a filter (if statement)

In class last time I also showed you an example of a list comprehension for adding a newline to every string in a list while using .writelines to write to a file:

fo.writelines(["%s\n" % i for i in alpha_list])

Adding the newline is the transformation; i for i is the iteration. There is no filter in this comprehension.

In the cell below, take alpha_list = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and transform its letters to lowercase letters, using a list comprehension.

In [34]:
alpha_list2 = [i.lower() for i in alpha_list]
print(alpha_list2)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


# Using a list comprehension to get parts of strings #

DNAseqs = ["AATTAATTAATT","ATTAATTAATTA","CCCCTTTTCCCC","GGTTGGTTGGTT","ACACACACACAC"]

In the cell below, use a comprehension to get seq[0:3] for seq in DNASeqs.

In [35]:
DNAseqs = ["AATTAATTAATT","ATTAATTAATTA","CCCCTTTTCCCC","GGTTGGTTGGTT","ACACACACACAC"]
DNAseq2 = [seq[0:3] for seq in DNAseqs]
print(DNAseq2)

['AAT', 'ATT', 'CCC', 'GGT', 'ACA']


# Turning file elements into lists #

I can also use a comprehension if I'm reading a file. Let's say I have list of lines (which is just what .readlines turns a file into) with the format:

```lines =
["SNPid, genotype, pattern", "SNPid2, genotype2, pattern", "SNPid3, genotype3, pattern", "SNPid4, genotype4, pattern2", "SNPid5, genotype5, pattern2",]```

A comprehension could get just the first part of the line if the transformation included a .split() method.

If I use the comprehension:

```snpids = [line.split()[0] for line in lines]```

I get the result:

```snpids = ['SNPid,', 'SNPid2,', 'SNPid3,', 'SNPid4,', 'SNPid5,']```

In the cell below, add the comprehension statement that will get the second field (genotype) in the line.

In [37]:
lines = ["SNPid, genotype, pattern", "SNPid2, genotype2, pattern", "SNPid3, genotype3, pattern", "SNPid4, genotype4, pattern2", "SNPid5, genotype5, pattern2",]
snpids = [line.split()[1] for line in lines]
print(snpids)

['genotype,', 'genotype2,', 'genotype3,', 'genotype4,', 'genotype5,']


For a reference list of list functions and methods, you can always go back here:

https://www.tutorialspoint.com/python/python_lists.htm
