# Chapter 9. Collections II: Sequence processing

I think this is the hardest chapter. Good luck!

I only introduce one new concept, the first class function, and that isn't even that hard (I promise!), but this is the point where all the concepts start to coalesce and you can really start writing interesting programs.

(The next chapters add new bites to your knowledge, but it would be murderous to have the next chapters be this comprehensive)

We talked about recursion. Now we need to talk about the other face of functional programming: first-class functions. This is best taught with an example.

In [8]:
def sumNos(a,b):
    return a+b

def subNos(a,b):
    return a-b

def opNos(a,b,operand):
    return operand(a,b)

def testFunctional():
    print("{0:d} + {1:d} = {2:d}".format(3, 4, opNos(3, 4, sumNos)))
    print("{0:d} + {1:d} = {2:d}".format(3, 4, opNos(3, 4, subNos)))
    return

See what I'm doing? I'm passing the name of a function as an argument
to another function!

It's crazy!

This is a utility function I'll use here and there.

In [9]:
def isEven(num):
    return num % 2 == 0

Here's a function that takes a tuple and removes elements where *condition*(element) is *false*

In [10]:
def filterTuple(things, condition):
    if (len(things) <= 0): #base case
        return ()
    else: # I know the tuple is at least one thing long.
        if (condition(things [0])):
            return things[0:1] + filterTuple(things[1:], condition) #What's up
        else:   #with that things[0:1] slice? How is it different than 
            return filterTuple(things[1:], condition)   #things[0]?b

In [11]:
#try
filterTuple((1,2,3,4,5,6,7,8,9),isEven)


(2, 4, 6, 8)

You didn't just skip *filterTuple*, did you?
Make sure you understand it.

Work it out, line by line. Stare at it, run it with different arguments,
it can be helpful to insert some print statements inside the function, like

In [12]:
#    if(condition(things[0]):
#       print("condition(things[0]) was true, things = {0:s}".format(things))

This will help you see what is happening inside the function as it runs. 

Here is a similar function, this time for a list instead of a tuple, and using iteration rather than recursion.

In [13]:
def filterList(things, condition):
    retList = []
    for thing in things:
        if condition(thing):
            retList.append(thing)
    return retList

*Filtering* is just one of a set of standard list-processing operations.

The others are *mapping*, *folding*, and *zipping*.

*Folding* is useful in certain languages, but python's *for* loop syntax is
well-designed, so it's rarely clearer than just using an explicit *for* loop.

I'll spare you the gory details of *folding*. (It's like a *for* loop where you have an accumulator variable, like you did in *largestRand* in Chapter 07)

*Mapping* takes a list and applies some operation to each element of the list.

For example, to square the even numbers in a list:

In [14]:
def squareEven(n):
    return n**2 if n % 2 == 0 else n

In [16]:
#try:
map(squareEven, [1,2,3,4,5,6])

<map at 0x274a62a8080>

Uh-oh.

So it doesn't actually return the list with the mapped values. This
deserves some explanation: *map* is a "lazy" operation. That is, it doesn't
actually do anything until it absolutely needs to. We can force it to evaluate
immediatele by calling *list*() on it:


In [18]:
list(map(squareEven, [1,2,3,4,5,6]))

[1, 4, 3, 16, 5, 36]

There are good reasons to not evaluate a *map* object completely, but for our
 purposes we'll usually listify it immediately.
 
Here's a function that traverses a tuple and displays every element, with a
 twist:
 
If one of the elements is a tuple itself, this function traverses that tuple and prints its contents, too.


In [23]:
def printElems(things):
    if (len(things)>0):
        if isinstance(things[0], tuple):
            printElems(things[0])
        else:
            print(things[0])
    if(len(things) > 1):
        printElems(things[1:])

####   *Exercises*

1 - Give an example where this function would crash but *printElems* would work correctly.

In [24]:
def printElemBad(things):
    if isinstance(things[0], tuple):
        printElemBad(things[0])
    else:
        print(things[0])
    if(len(things) > 1):
        printElemBad(things[1:])

2 - Some doofus wrote the following buggy function. Fix it. *Bonus*: Fix it by adding only one character. 

In [25]:
def reverseSequence(seq):
    if(len(seq) <= 1):
        return seq
    else:
        return seq[-1] + reverseSequence(seq[0:-1])

3 - Given a string of bases (A, G, T, and C), produce a list of the three-letter codons that sequence gives you. If there are 1 or 2 letters at the end of the sequence, ignore them.

In [26]:
#mkCodons("AGATTAGCCATCGGACTTGATGC") ->
#  ["AGA", "TTA", "GCC", "ATC", "GGA", "CTT", "GAT"]
def mkCodons(dna):
    pass

4 - Write a function that sums all of the elements in a list, including cases where one of the elements is another list.

In [27]:
#sumDeep([1, 2, [3, 4], 3]) -> 13
def sumDeep(lst):
    pass


5 - Write a function that takes a list and returns a new list with no duplicates in it. (The original list should be left unchanged.)

You may find it helpful to decompose this problem into several functions.

In [28]:
def stripDups(lst):
    pass

6 - A log file has the format:

In [29]:
#timestamp:data:note

Write a function that could be used with map to extract the data.

In [30]:
#list(map(getData, ("2012-1-5:124.51:Jen's plant dead",
#                   "2012-1-6:135.4:no note",
#                   "2012-1-17:156.425:Cream cheese stolen")))
#-> (124.51, 135.4, 156.425)

Note: The resulting tuple should contain *numbers*, not strings. You can use float() to convert a string to a number.

You may assume that there will be no stray colons in the note, so don't worry
about "*2012-1-8:105.64:Dan Fox: Please quit eating my food*"

If you're not sure how to do this, consider Googling it. Somebody has solved a similar problem before you...

In [31]:
def getData(line):
    pass

7 - Write a function that takes a tuple of strings of dna, and returns the table of Hamming distances, as discussed earlier. As a reminder, the *Hamming distance* is the number of letters that two strings differ by.

To return a table, return a list of lists, each inner list corresponding to one row (or one column, it doesn't matter since the table is symmetric along the diagonal.)

In [22]:
def hammingDistances(sequences):
    pass

def testHammingDistances():
    seq1 = "AGAGGTGCAGTC"
    seq2 = "AGACATGCATGA"    
    seq3 = "GGACATGCATGA"
    seq4 = "AGACTTGCATAA"
    seq5 = "AGACGTGCATAA"
    seq6 = "AGACGTGCATTC"
    print(hammingDistances((seq1, seq2, seq3, seq4, seq5, seq6)))
    #should return something like ((0, 3, 4), (3, 0, 2), (4, 2, 0))
    #(of course, it will be 6x6, not 3x3)