# Chapter 8: Collections I: Sequence syntax.

In [16]:
from math import sqrt

Sequences are Python's main data structures. They offer a way to store more than one thing in a variable.

We start with the simplest sequence, the *tuple*. A *tuple* is an ordered collection of things.

In [17]:
def tupleMake():
    #We define a tuple by separating things with commas and surrounding them
    #with parentheses.
    #Here is a tuple of three numbers:
    threeNums = (1, 2, 3)
    #Tuples can have dissimilar elements:
    threeThings = (1, False, "astring")

    #A single element tuple is represented by a thing followed by a comma:
    oneThingTpl = (1,)
    #(See the low-level note below for an explanation of how this differs from:
    oneThingNotTpl = 1 #)

    #We represent the empty tuple by () 
    emptyTuple = ()

    #I can store *anything* in a tuple. Even another tuple!
    nestedTuple = (1, 2, (1, 3), ("hello", 4.5, ( False, False), ()))

##### *Optional interlude*

Low-level explanation of sequences.

We will consider this snippet of code:

In [18]:
def lowLevel():
    a = 10
    b = (15,20)
    print(a)
    print(b)

Okay, this is going to get hairy. Brace yourself.

When Python encounters
the first line, it figures out that you want to do an assignment (it knows
this because the second element of the line is an = sign).

So, it evaluates the right-hand side of the statement. When Python interprets
the number 10, it finds some space in memory, and creates an object there. 

Of course, this object contains the number (in binary, 00001010).

But if the object only contained the binary number 0001010, when Python saw that
memory, it wouldn't know it was a number.

It could be a date, it could be a character, it could be anything. 

So, Python stores a second thing in this object: the type of the object.
Here's what memory looks like when python creates the object for the number 10:

In [19]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |        |         1 |       |
#   |        |         2 |       |
#   |        |         3 |       |
#   |        |         4 |int 10 | <--
#   |        |         5 |       |
#   |        |         6 |       |
#   |        |         7 |       |
#   |        |         8 |       |
#   |        |         9 |       |
#   |________|        10 |_______|

The arrow indicates something has changed.

The "heap" is the bulk of memory. All objects (in Python, at least) are
stored on the heap. I'll get to the environment part next, but for now,
the picture is that Python has found some free space on the heap,
(it picked address 4 on the heap, which just corresponds to the fourth
word of memory Python has access to.)
and Python stored the number object there. So we're done with the 
right-hand side of the line "a = 10" 

Now, Python binds *a* to the object created. It stores this 'binding' in
the environment, so any time the code refers to '*a*', Python will know
where in memory to fetch the value. 


In [None]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 | <--     1 |       |
#   |        |         2 |       |
#   |        |         3 |       |
#   |        |         4 |int 10 |
#   |        |         5 |       |
#   |        |         6 |       |
#   |        |         7 |       |
#   |        |         8 |       |
#   |        |         9 |       |
#   |________|        10 |_______|

Now, any time the code refers to '*a*', Python will search the environment,
see that '*a*' is stored at address 4 in memory, and fetch the number object
created earlier. 

Now, we get to the creation of a tuple. When Python sees "b = (15,20)",
it first creates each element of the tuple on the heap - in this case, 
that's the numbers 15 and 20:

In [6]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 |         1 |int 15 | <--
#   |        |         2 |       |
#   |        |         3 |int 20 | <--
#   |        |         4 |int 10 | 
#   |        |         5 |       |
#   |        |         6 |       |
#   |        |         7 |       |
#   |        |         8 |       |
#   |        |         9 |       |
#   |________|        10 |_______|

Now, it creates the tuple itself. The tuple is just the addresses of each of its elements, and a length.

In [7]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 |         1 |int 15 |
#   |        |         2 |       |
#   |        |         3 |int 20 |
#   |        |         4 |int 10 |
#   |        |         5 |       |
#   |        |         6 |tpl  2 | <-- 
#   |        |         7 |     1 | <-- 
#   |        |         8 |     3 | <--
#   |        |         9 |       |
#   |________|        10 |_______|

The first line (address 6) identifies the object as a tuple of length 2.
Then, the next two lines give the addresses of each of the elements. Here, the
elements are stored at address 1 and 3, in that order. 

Now that the tuple has been created, we create a binding for the variable '*b*'
to that tuple:

In [8]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 |         1 |int 15 |
#   |b     6 | <--     2 |       |
#   |        |         3 |int 20 |
#   |        |         4 |int 10 |
#   |        |         5 |       |
#   |        |         6 |tpl  2 |
#   |        |         7 |     1 |
#   |        |         8 |     3 |
#   |        |         9 |       |
#   |________|        10 |_______|

So, when Python encounters the statement "*print(a)*", it first looks up the 
symbol '*a*' in the environment, then fetches the object at address 4. 
It passes that object to the print function, which converts it to the 
appropriate characters to display on the screen. 

So now, what's the difference between

*c* = 7
and 
*d* = (7,)
? 

Well, after running those lines, our environment and heap would look like:

In [9]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 |         1 |int 15 |
#   |b     6 |         2 |int  7 | <--
#   |c    10 | <--     3 |int 20 |
#   |d    12 | <--     4 |int 10 |
#   |        |         5 |       |
#   |        |         6 |tpl  2 |
#   |        |         7 |     1 |
#   |        |         8 |     3 |
#   |        |         9 |       |
#   |        |        10 |int  7 | <--
#   |        |        11 |       |
#   |        |        12 |tpl  1 | <--
#   |        |        13 |     2 | <--
#   |        |        14 |       |
#   |        |        15 |       |
#   |        |        16 |       |
#   |________|        17 |_______|

Make sense? Okay, hotshot. How about this:

*e* = (7, 8, (9, 10))

*e* contains THREE elements. Its last ONE element happens to be another tuple.

In [20]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     4 |         1 |int 15 |
#   |b     6 |         2 |int  7 |
#   |c    10 |         3 |int 20 |
#   |d    12 |         4 |int 10 |
#   |e    14 | <--     5 |int  7 | <--
#   |        |         6 |tpl  2 |
#   |        |         7 |     1 |
#   |        |         8 |     3 |
#   |        |         9 |int  8 | <--
#   |        |        10 |int  7 |
#   |        |        11 |int 10 | <--
#   |        |        12 |tpl  1 |
#   |        |        13 |     2 |
#   |        |        14 |tpl  3 | <--
#   |        |        15 |     5 | <--
#   |        |        16 |     9 | <--
#   |        |        17 |    19 | <--
#   |        |        18 |int  9 | <--
#   |        |        19 |tpl  2 | <--
#   |        |        20 |    18 | <--
#   |        |        21 |    11 | <--
#   |________|        22 |_______|

I've put the elements in no particular order in memory, because Python puts
them in no particular order. 

##### *End - Optional interlude*

Okay, now that you can create tuples, how do you get information from tuples?

In [21]:
def extractInfo():
    a = (8, 9, 10, 11)
    #We get elements from a tuple using [] syntax.
    #Each element of a tuple is numbered, starting with 0.
    #(This is a common source of subtle bugs.)
    print(a[0])

    #We can step through the elements of a tuple...
    for i in range(4):
        print(a[i])

    #We can also iterate over the tuple directly.
    for val in a:
        print(val)
    #(interestingly, range(4) is (basically) the tuple (0,1,2,3).)
    
    #There are some fancier ways to get subsets of a tuple; these are
    #called slices.
    #tupleName[start:end]
    #for example,
    print(a[1:3])
    #Run this function. Are you surprised by the result of that slice?
    #When you perform a slice like this, your slice *includes* the start
    #element and stops *before* the last one.

    #If you leave off the start or end in a slice, you only clip the tuple
    #in one direction:
    print(a[:3])
    #(This prints three elements of the tuple: a[0], a[1], and a[2]. Note that
    #it does not print a[3].)
    #Uh, what else? Oh! You can use a negative index to get a value relative
    #to the end of the tuple. a[-1] is the last element, a[-2] is the second-
    #to-last, and so on.
    print(a[-1])
 
    #There's on last handy feature for tuples: element-wise assignment.
    x, y = (a, "sheep")
    #is equivalent to
    x=a
    y="sheep"
    #(It is an error to have a different number of elements on one side.)

There are a few functions that are so common on tuples they're always available:

In [22]:
def builtinTuple():
    a = (1, 2, "hello", True)
    #len returns the length of a collection.
    print("a has ", len(a) , "elements in it.")

    # + can be used to concatenate tuples:
    b = ("aoeu", "snth")
    print(a+b)

    #For tuples of numbers, there is the sum function to add them all up.
    c = (10, 13, 16, 31)
    print( sum(c))

    #We can check to see if something is a tuple with the isinstance function.

    print(isinstance( (3, 4, 5), tuple))
    print(isinstance( 1.3, tuple))
    
    #You may find isinstance useful in list processing, next chapter. 
    #As you might guess, you can also use isinstance to see if something is
    #another type, like int, str, or list. 

 

There are other sequences: Lists are much like tuples, but they can be modified.

Strings are immutable like tuples, but they have some convenient syntax since we use them so much.

In [23]:
def listMake():
    #lists use brackets where tuples use parentheses:
    a = [1, 2, 3]
    #Like tuples, the elements of a list can be anything,
    #even other lists or tuples.
    #Unlike tuples, it's valid to change an element in a list.
    a[1]=5

    #If you have a tuple (or string or map object or whatever) and want to
    #make a list, use the list builtin:
    b = list((6, 4.3))

    #If you need a long empty list and don't care what's in it, this
    #trick will make a list that is 1000 items long and just has the numbers
    #0 to 999 in it.
    #(the idea being you'll add something else later.)
    longList=list(range(1000))

In addition to the methods discussed for tuples, lists have methods for modification.

In [24]:
def listMethods():
    a = [1.5, 2.4]
    #append adds the indicated item to the end of the list.
    a.append(5)
    #This is a new syntactic form, called a method. This will make a lot
    #of sense after CH15, but for now just roll with it. It's basically a
    #function call, but we say what thing we want the function to operate on
    #by using thing.function()
    #If you want to append multiple items, either iterate through them:
    b = [3, 1.5, 4]
    for elem in b:
        a.append(elem)
    #or use the convenient extend syntax. (it does the same thing.)
    a.extend(b)
    #Note the difference between
    x = [1,2,3]
    x.extend([4,5,6])
    print(x)
    #and
    y = [1,2,3]
    y.append([4,5,6])
    print(y)
    #There's even a method to sort the elements.
    c = [1,4,7,2,5,8]
    c.sort()
    print (c)

Lists have the same slicing syntax as tuples.

The mutability of lists can really trip you up, though.

In [25]:
def listMut():
    a = [1, 2, 3]
    b = [1, 2, 3]
    c = a
    a[0] = 5
    print("a = ", a)
    print("b = ", b)
    print("c = ", c)

##### *Begin Optional interlude*

To explain this, we need to go back to our environment-heap memory explanation.

First, we create the two lists *a* and *b*:

In [None]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     1 |         1 |lst  3 |
#   |b     8 |         2 |     5 |
#   |        |         3 |     6 |
#   |        |         4 |     7 |
#   |        |         5 |int  1 |
#   |        |         6 |int  2 |
#   |        |         7 |int  3 |
#   |        |         8 |lst  3 |
#   |        |         9 |    12 |
#   |        |        10 |    13 |
#   |        |        11 |    14 |
#   |        |        12 |int  1 |
#   |        |        13 |int  2 |
#   |        |        14 |int  3 |
#   |________|        15 |_______|

Now, we get to the *c* = *a* line.

In this case, Python doesn't duplicate the list itself, rather, it just says that *c* is another name for the same object
as *a*. 

In [26]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     1 |         1 |lst  3 |
#   |b     8 |         2 |     5 |
#   |c     1 | <--     3 |     6 |
#   |        |         4 |     7 |
#   |        |         5 |int  1 |
#   |        |         6 |int  2 |
#   |        |         7 |int  3 |
#   |        |         8 |lst  3 |
#   |        |         9 |    12 |
#   |        |        10 |    13 |
#   |        |        11 |    14 |
#   |        |        12 |int  1 |
#   |        |        13 |int  2 |
#   |        |        14 |int  3 |
#   |________|        15 |_______|

The heap hasn't changed at all!

Now, when you assign to an element of *a*, it creates the new value (the number
5 in this case) and rewires the list so the first element now points to the new object:

In [27]:
#    ENVIRONMENT           HEAP
#    ________             _______
#   |a     1 |         1 |lst  3 |
#   |b     8 |         2 |    15 | <--
#   |c     1 |         3 |     6 |
#   |        |         4 |     7 |
#   |        |         5 |int  1 |
#   |        |         6 |int  2 |
#   |        |         7 |int  3 |
#   |        |         8 |lst  3 |
#   |        |         9 |    12 |
#   |        |        10 |    13 |
#   |        |        11 |    14 |
#   |        |        12 |int  1 |
#   |        |        13 |int  2 |
#   |        |        14 |int  3 |
#   |        |        15 |int  5 | <--
#   |________|        16 |_______|

So when I mutate *a*, I am mutating the memory that *c* points to.

Note that this behavior is confusing and a great source of bugs - for this
reason, copying a list's pointer is not encouraged.

If you want to copy a list on the heap, you can make an entire slice:
*d* = *a*[:]

This will copy *a*'s contents, so changing *a* will not change *c* 

##### *End optional interlude*

If you chose to skip the previous section:
    
Different variables can refer to the same list.

To get a copy of a list, use the whole slice:
    
    c = a[:] 
    
will create a new list you can mess with as much as you want.


Strings are not mutable, in this sense they are like tuples.

In [28]:
def stringMake():
    #You already know the basics of strings. 
    a = "hi."
    #You can slice them just like lists and tuples. 
    #The best part about strings is the formatting method.
    #If you put some flags into a string, Python will inject your desired
    #information into the string on the fly:
    
    print ("{0:s}, world!".format("Hello"))
    #When python gets to the {}, it looks at the arguments to the 
    #format method at the end of the string.
    #The insides of a format specifier are: 
    #{<position in argument list> : [extra information] [type code]}
    #The position is mandatory, the items in [brackets] are optional. 
    #The code {0:s} indicates that Python should take the first argument
    #to format (remember, it's zero-based) and format it as a string
    #(hence the "s"). (The [extra information] was empty in this case)
    
    #Here are the type codes:
    # s :: Format as a string. See example above.
    #      Note: Most objects are capable of being stringified. 
    #      For example, "{0:s}".format((1,2)) will return '(1,2)'
    # d :: format as an integer. (d stands for decimal, as in base 10.)
    # E :: format in scientific notation.
    # f :: Floating point number. (non-whole numbers.)
    # g :: Intelligently choose E or f depending on how big or small the number is.
    # E, f, and g have some funky [extra information] flags you can use.
    # [flags][width[.precision]]
    # Flags are zero or more of:
    # < :: left-justify the result.
    # + :: show positive signs as well as the usual negative signs.
    #  (there are lots more, check out
    # http://docs.python.org/3.3/library/string.html if you need to do anything
    # really fancy.)
    #For example, "{0:+f}".format(0.3) -> +0.300000
    #Width is the size of the total field, including decimal points,
    #signs, the works. Note that if the width is too small, the output will
    #not be truncated, the width specifier will be ignored. 
    #Precision, if used, must be preceeded by a period. It should not be used
    #without a width.
    #Oh, the d code supports all the flags but precision. (because there are
    #*never* any digits after the decimal in an integer.)
    
    print ("{0:+4.5f}".format(4521.3135))
    #That monster format specifier says:
    # 0 Take the first argument to format
    # + Display a plus sign if possible.
    # 4 Use four characters for the field width (or more if necessary)
    # .5 Display 5 digits after the decimal. (so the 4 will almost certainly
    #          be ignored.)
    # f Show as a normal decimal number. (not scientific notation)
    
    
    #Example:
    baseStr = "The square root of {0:5d} is {1:7.4f} and the square of {0:5d} is {2:5d}."
    for num in (0, 1, 3, 5, 25):
        print(baseStr.format(num, sqrt(num), num*num))
    #Explanation of each code:
    
    # {0:5d} is pretty simple - take the first argument to format 
    #(remember, zero-based!) and format it as a decimal number. 
    #Pad it with spaces to five characters. 
    
    #{1:7.4f} is the most complex, Take the second argument to 
    #format and render it as a seven-character wide floating point
    #number with four places after the decimal.
    
    #{0:5d} is exactly the same as the first time. I'm using the
    #first argument twice in the string.
    
    #{2:5d} should be familiar - use the third argument, format it
    #as an integer, and pad it to five characters wide.


In [29]:
####   *Exercises*

1 - Write a function that takes a tuple of numbers and prints their mean,
without using the sum() function for tuples.

The printed response should be formatted using this style:
*tupleAve*((1, 3, 5)) ->  The average of (1, 3, 5) is 3.000000

In [30]:
def tupleAve(tpl):
    pass

2 - If you did the optional interlude, explain the difference between 'a=1' and 'a=(1,)'

3 - Write a function that takes a list and two numbers and swaps the elements at those positions.

Note that the function may or may not return something, but the original list should be mutated. (This is called an "in-place" change)
That is,

a=[1,4,6]

swap(a, 1, 2)

print (a) -> [1, 6, 4]


In [31]:
def swap(lst, pos1, pos2):
    pass

4 -  Write a function that takes a list and two numbers and returns a list with the elements at those positions swapped.
The original list should *not* be mutated. 

(Functions that don't mutate their arguments or access any global state are called pure.)

a = [1,4,6]

b = swapConst(a,1,2)

print (b) -> [1, 6, 4]

print (a) -> [1, 4, 6]


In [32]:
def swapConst(lst, pos1, pos2):
    pass

5 - Write a function that checks to see if a list of numbers is sorted.

In [33]:
def isSorted(lst):
    pass

6 - What are *x*, *y*, and *z* at the end of this code?

Don't run it, work through it by hand.

(Note: This is a hard exercise. Feel free to use some scratch paper or whiteboard/window space.)

In [34]:
# x = (1,)
# y = x+x+(2,)
# x = (y,x)
# z = x[0][1]
# x,y = (x+(z,y))[:2]

7 - Write a function to convert numeric gradebook grades into ten-character wide strings. It should return a string, not print it. The result should have two numbers after the decimal and be followed by a % sign.

The *total* returned string should be 10 characters wide.

(Remember to count characters not in your format specifier (such as the % sign) in this 10.)

*formatGrade(67.133) -> '    67.13%'*

In [35]:
def formatGrade(grade):
    pass
