# Lesson 10: Tuples

The tuple is a Python data structure that is like a simple and efficient list.

## Tuples are Immutable

Video: Tuples - Part 1

### https://www.youtube.com/watch?v=CaVhM65wD6g 

A tuple

- [Fun fact: The word "tuple" comes from the names given to sequences of numbers of varying lengths: single, double, triple, quadruple, quintuple, sextuple, septuple, etc.] 

is a sequence of values much like a list. The values stored in a tuple can be any type, and they are indexed by integers. The important difference is that tuples are *immutable*. Tuples are also *comparable* and *hashable* so we can sort lists of them and use tuples as key values in Python dictionaries.

Syntactically, a tuple is a comma-separated list of values:

In [1]:
t = 'a', 'b', 'c', 'd', 'e'
t

('a', 'b', 'c', 'd', 'e')

Although it is not necessary, it is common to enclose tuples in parentheses to help us quickly identify tuples when we look at Python code:


In [3]:
t = ('a', 'b', 'c', 'd', 'e')
t

('a', 'b', 'c', 'd', 'e')

To create a tuple with a single element, you have to include the final comma:

In [5]:
t1 = ('a',)
type(t1)

tuple

In [7]:
t1 = 'a',
type(t1)

tuple

Without the comma Python treats (`'a'`) as an expression with a string in parentheses that evaluates to a string:

In [9]:
t2 = ('a')
type(t2)

str

In [13]:
t = tuple()
print(type(t))
print(t)

<class 'tuple'>
()


If the argument is a sequence (string, list, or tuple), the result of the call to `tuple` is a tuple with the elements of the sequence:

In [15]:
t = tuple('lupins')
print(t)

('l', 'u', 'p', 'i', 'n', 's')


Because `tuple` is the name of a constructor, you should avoid using it as a variable name.

Most list operators also work on tuples. The bracket operator indexes an element:

In [18]:
t = ('a', 'b', 'c', 'd', 'e')
print(t[0])

a


And the slice operator selects a range of elements.

In [19]:
print(t[1:3])

('b', 'c')


But if you try to modify one of the elements of the tuple, you get an error:

In [21]:
t[0] = 'A'
t

TypeError: 'tuple' object does not support item assignment

You can't modify the elements of a tuple, but you can replace one tuple with another:

In [23]:
>>> t = ('A',) + t[1:]
>>> print(t)
('A', 'b', 'c', 'd', 'e')

('A', 'b', 'c', 'd', 'e')


('A', 'b', 'c', 'd', 'e')

## Comparing Tuples

Video Tuples part 2 

### <https://www.youtube.com/watch?v=FdUdA6o0Ij0>

The comparison operators work with tuples and other sequences. Python starts by comparing the first element from each sequence. If they are equal, it goes on to the next element, and so on, until it finds elements that differ. Subsequent elements are not considered (even if they are really big).

In [27]:
(0, 1, 2) < (0, 3, 4)

True

In [30]:
(0, 1, 2000000) < (0, 3, 4)

True

The `sort` function works the same way. It sorts primarily by first element, but in the case of a tie, it sorts by second element, and so on.

This feature lends itself to a pattern called **DSU** for

**Decorate**: a sequence by building a list of tuples with one or more sort keys preceding the elements from the sequence,

**Sort**: the list of tuples using the Python built-in sort, and

**Undecorate**: by extracting the sorted elements of the sequence.

For example, suppose you have a list of words and you want to sort them from longest to shortest:

In [31]:
txt = 'but soft what light in yonder window breaks'
words = txt.split()
t = list()
for word in words:
    t.append((len(word), word))

t.sort(reverse=True)

res = list()
for length, word in t:
    res.append(word)

print(res)

['yonder', 'window', 'breaks', 'light', 'what', 'soft', 'but', 'in']


The first loop builds a list of tuples, where each tuple is a word preceded by its length.

`sort` compares the first element, length, first, and only considers the second element to break ties. The keyword argument `reverse=True` tells `sort` to go in decreasing order.

The second loop traverses the list of tuples and builds a list of words in descending order of length. The four-character words are sorted in reverse alphabetical order, so "what" appears before "soft" in the following list.

The output of the program is as follows:

In [32]:
['yonder', 'window', 'breaks', 'light', 'what',
'soft', 'but', 'in']

['yonder', 'window', 'breaks', 'light', 'what', 'soft', 'but', 'in']

Of course the line loses much of its poetic impact when turned into a Python list and sorted in descending word length order.

## Tuple Assignment

One of the unique syntactic features of the Python language is the ability to have a tuple on the left side and a sequence on the right side of an assignment statement. This allows you to assign more than one variable at a time to the given sequence.

In this example we have a two-element list (which is a sequence) and assign the first and second elements of the sequence to the variables `x` and `y` in a single statement.

In [34]:
m = [ 'have', 'fun' ]
x, y = m
x

'have'

In [35]:
y

'fun'

It is not magic, Python *roughly* translates the tuple assignment syntax to be the following:^[Python does not translate the syntax literally. For example, if you try this with a dictionary, it will not work as you might expect.]

In [36]:
>>> m = [ 'have', 'fun' ]
>>> x = m[0]
>>> y = m[1]
>>> x
'have'
>>> y
'fun'
>>>


'fun'

Stylistically when we use a tuple on the left side of the assignment statement, we omit the parentheses, but the following is an equally valid syntax:

A particularly clever application of tuple assignment allows us to swap the values of two variables in a single statement:

In [37]:
>>> a, b = b, a

NameError: name 'b' is not defined

Both sides of this statement are tuples, but the left side is a tuple of variables; the right side is a tuple of expressions. Each value on the right side is assigned to its respective variable on the left side. All the expressions on the right side are evaluated before any of the assignments.

The number of variables on the left and the number of values on the right must be the same:

In [39]:
>>> a, b = 1, 2, 3

ValueError: too many values to unpack (expected 2)

More generally, the right side can be any kind of sequence (string, list, or tuple). For example, to split an email address into a user name and a domain, you could write:

In [40]:
>>> addr = 'monty@python.org'
>>> uname, domain = addr.split('@')

The return value from `split` is a list with two elements; the first element is assigned to uname, the second to domain.

In [42]:
print(uname)
print(domain)

monty
python.org


## Dictionaries and Tuples

Dictionaries have a method called `items` that returns a list of tuples, where each tuple is a key-value pair:



In [45]:
d = {'b':1, 'a':10, 'c':22}
t = list(d.items())
print(t)


[('b', 1), ('a', 10), ('c', 22)]


As you should expect from a dictionary, the items are in non-alphabetical order.

However, since the list of tuples is a list, and tuples are comparable, we can now sort the list of tuples. Converting a dictionary to a list of tuples is a way for us to output the contents of a dictionary sorted by key:

In [46]:
d = {'b':1, 'a':10, 'c':22}
t = list(d.items())
t

[('b', 1), ('a', 10), ('c', 22)]

In [48]:
t.sort()
t

[('a', 10), ('b', 1), ('c', 22)]

The new list is sorted in ascending alphabetical order by the key value.

## Multiple Assignments with Dictionaries


Combining `items`, tuple assignment, and `for`, you can see a nice code pattern for traversing the keys and values of a dictionary in a single loop:



In [49]:
d = {'a':10, 'b':1, 'c':22}
for key, val in list(d.items()):
    print(val, key)

10 a
1 b
22 c


This loop has two iteration variables because `items` returns a list of tuples and `key, val` is a tuple assignment that successively iterates through each of the key-value pairs in the dictionary.

For each iteration through the loop, both `key` and `value` are advanced to the next key-value pair in the dictionary (still in hash order).

Again, it is in hash key order (i.e., no particular order).

If we combine these two techniques, we can print out the contents of a dictionary sorted by the value stored in each key-value pair.

To do this, we first make a list of tuples where each tuple is (`value, key`). The `items` method would give us a list of (`key, value`) tuples, but this time we want to sort by value, not key. Once we have constructed the list with the value-key tuples, it is a simple matter to sort the list in reverse order and print out the new, sorted list.

In [54]:
d = {'a':10, 'b':1, 'c':22}
l = list()
for key, val in d.items() :
    l.append( (val, key) )
    l.sort(reverse=True)
l

[(22, 'c'), (10, 'a'), (1, 'b')]

By carefully constructing the list of tuples to have the value as the first element of each tuple, we can sort the list of tuples and get our dictionary contents sorted by value.

## The Most Common Words


Coming back to our running example of the text from Romeo and Juliet Act 2, Scene 2, we can augment our program to use this technique to print the ten most common words in the text as follows:

In [2]:
import string
fhand = open('romeo-full.txt')
counts = dict()
for line in fhand:
    line = line.translate(str.maketrans('', '', string.punctuation))
    line = line.lower()
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

# Sort the dictionary by value
lst = list()
for key, val in list(counts.items()):
    lst.append((val, key))

lst.sort(reverse=True)

for key, val in lst[:10]:
    print(key, val)

61 i
42 and
40 romeo
34 to
34 the
32 thou
32 juliet
30 that
29 my
24 thee


The first part of the program which reads the file and computes the dictionary that maps each word to the count of words in the document is unchanged. But instead of simply printing out counts and ending the program, we construct a list of (`val, key`) tuples and then sort the list in reverse order.

Since the value is first, it will be used for the comparisons. If there is more than one tuple with the same value, it will look at the second element (the key), so tuples where the value is the same will be further sorted by the alphabetical order of the key.

At the end we write a nice `for` loop which does a multiple assignment iteration and prints out the ten most common words by iterating through a slice of the list (`lst[:10]`).

So now the output finally looks like what we want for our word frequency analysis.

The fact that this complex data parsing and analysis can be done with an easy-to-understand 19-line Python program is one reason why Python is a good choice as a language for exploring information.

## Using Tuples as Keys in Dictionaries

Because tuples are hashable and lists are not, if we want to create a composite key to use in a dictionary we must use a tuple as the key.

We would encounter a composite key if we wanted to create a telephone directory that maps from last-name, first-name pairs to telephone numbers. Assuming that we have defined the variables `last`, `first`, and `number`, we could write a dictionary assignment statement as follows:

In [7]:
number = directory[last,first]


NameError: name 'directory' is not defined

The expression in brackets is a tuple. We could use tuple assignment in a `for` loop to traverse this dictionary.

In [6]:
for last, first in directory:
    print(first, last, directory[last,first])


NameError: name 'directory' is not defined

This loop traverses the keys in directory, which are tuples. It assigns the elements of each tuple to last and first, then prints the name and corresponding telephone number.

## Sequences: Strings, Lists, and Tuples

I have focused on lists of tuples, but almost all of the examples in this chapter also work with lists of lists, tuples of tuples, and tuples of lists. To avoid enumerating the possible combinations, it is sometimes easier to talk about sequences of sequences.

In many contexts, the different kinds of sequences (strings, lists, and tuples) can be used interchangeably. So how and why do you choose one over the others?

To start with the obvious, strings are more limited than other sequences because the elements have to be characters. They are also immutable. If you need the ability to change the characters in a string (as opposed to creating a new string), you might want to use a list of characters instead.

Lists are more common than tuples, mostly because they are mutable. But there are a few cases where you might prefer tuples:

In some contexts, like a `return` statement, it is syntactically simpler to create a tuple than a list. In other contexts, you might prefer a list.

If you want to use a sequence as a dictionary key, you have to use an immutable type like a tuple or string.

If you are passing a sequence as an argument to a function, using tuples reduces the potential for unexpected behavior due to aliasing.

Because tuples are immutable, they don't provide methods like `sort` and `reverse`, which modify existing lists. However Python provides the built-in functions `sorted` and `reversed`, which take any sequence as a parameter and return a new sequence with the same elements in a different order.

## List Comprehension

In [9]:
list_of_ints_in_strings = ['42', '65', '12']
list_of_ints = []
for x in list_of_ints_in_strings:
    list_of_ints.append(int(x))

print(sum(list_of_ints))

119


With list comprehension, the above code can be written in a more compact manner:

In [10]:
list_of_ints_in_strings = ['42', '65', '12']
list_of_ints = [ int(x) for x in list_of_ints_in_strings ]
print(sum(list_of_ints))

119


## Debugging

Lists, dictionaries and tuples are known generically as *data structures*; in this chapter we are starting to see compound data structures, like lists of tuples, and dictionaries that contain tuples as keys and lists as values. Compound data structures are useful, but they are prone to what I call *shape errors*; that is, errors caused when a data structure has the wrong type, size, or composition, or perhaps you write some code and forget the shape of your data and introduce an error. For example, if you are expecting a list with one integer and I give you a plain old integer (not in a list), it won't work.

## Glossary

**comparable**: A type where one value can be checked to see if it is greater than, less than, or equal to another value of the same type. Types which are comparable can be put in a list and sorted.

**data structure**: A collection of related values, often organized in lists, dictionaries, tuples, etc.

**DSU**: Abbreviation of "decorate-sort-undecorate", a pattern that involves building a list of tuples, sorting, and extracting part of the result.

**gather**: The operation of assembling a variable-length argument tuple.

**hashable**: A type that has a hash function. Immutable types like integers, floats, and strings are hashable; mutable types like lists and dictionaries are not.

**scatter**: The operation of treating a sequence as a list of arguments.

**shape (of a data structure)**: A summary of the type, size, and composition of a data structure.

**singleton**: A list (or other sequence) with a single element.

**tuple**: An immutable sequence of elements.

**tuple assignment**: An assignment with a sequence on the right side and a tuple of variables on the left. The right side is evaluated and then its elements are assigned to the variables on the left.

## Exercise 1 

Video: Sorting a Dictionary Using Tuples

### <https://www.youtube.com/watch?v=hMJpet-gtc0>

Revise a previous program as follows: Read and parse the "From " lines and pull out the addresses from the line. Count the number of messages from each person using a dictionary.

After all the data has been read, print the person with the most commits by creating a list of (count, email) tuples from the dictionary. Then sort the list in reverse order and print out the person who has the most commits.

```
Sample Line:
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

Enter a file name: mbox-short.txt
cwen@iupui.edu 5

Enter a file name: mbox.txt
zqian@umich.edu 195
```

In [29]:
def ex_10_01():
    filename = input("Enter file name: ")
    fhand = open(filename, 'r')
    email_addresses = {}
    
    for line in fhand:
        if line.startswith("From "):
            email = line.split()[1]
            email_addresses[email] = email_addresses.get(email, 0) + 1 
    
    lst = []
    for key, value in email_addresses.items(): #
        lst.append((value,key))
        
    lst.sort(reverse=True) # largest first with reverse=True
    person_tuple = lst[0]  # set email person to be 1st item in lst
    print(person_tuple[1], person_tuple[0]) # print in the order of the exercise
        
    
ex_10_01()           

Enter file name: mbox-short.txt
cwen@iupui.edu 5


In [60]:
# class example
def ex_10_01():
    handle = open(input("Enter file name: "))
    histogram = dict()
    
    for line in handle:
        line_split = line.split()
        if len(line_split) > 0 and line_split [0] == "From":
            # print(line_split)
            email = line_split[1]
            histogram[email] = histogram.get(email, 0) + 1

    reverse_tuples = histogram.items()
    tuples = []

    for email, count in reverse_tuples:
        tuples.append((count, email))

    tuples.sort(reverse=True)
    maxCount, maxEmail = tuples[0]
    print(maxEmail, maxCount)

ex_10_01()

Enter file name: mbox-short.txt
cwen@iupui.edu 5


## Exercise 2 

This program counts the distribution of the hour of the day for each of the messages. You can pull the hour from the "From " line by finding the time string and then splitting that string into parts using the colon character. Once you have accumulated the counts for each hour, print out the counts, one per line, sorted by hour as shown below. Accept and complete the assignment in the Github Classroom

```
python timeofday.py
Enter a file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
```

In [40]:
def ex_10_02():
    """
    This program counts the distribution of the hour of the day for each of 
    the messages. You can pull the hour from the "From " line by finding the 
    time string and then splitting that string into parts using the colon 
    character. Once you have accumulated the counts for each hour, print out
    the counts, one per line, sorted by hour as shown below.
    
    python timeofday.py
    Enter a file name: mbox-short.txt
    04 3
    06 1
    07 1
    09 2
    10 3
    11 6
    14 1
    15 2
    16 4
    17 2
    18 1
    19 1
    """
    
    filename = input("Enter file name: ")
    fhand = open(filename, 'r')
    hours_of_day = {}
    
    for line in fhand:
        if line.startswith("From "):
            time = line.split()[5]
            hour = time.split(":")[0]
            hours_of_day[hour] = hours_of_day.get(hour, 0) + 1 
    #print(hours_of_day)
    
    
    lst = list(hours_of_day.items())
#     for key, value in hours_of_day.items(): #
#         lst.append((key, value))
        
    lst.sort()
    for l in lst:
        print(l[0], l[1])
    print(lst)
        
#     lst.sort(reverse=True) # largest first with reverse=True
#     person_tuple = lst[0]  # set email person to be 1st item in lst
#     print(person_tuple[1], person_tuple[0]) # print in the order of the exercise
        
    
ex_10_02()   

Enter file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
[('04', 3), ('06', 1), ('07', 1), ('09', 2), ('10', 3), ('11', 6), ('14', 1), ('15', 2), ('16', 4), ('17', 2), ('18', 1), ('19', 1)]


In [62]:
# submitted
def exercise_10_2():
    """
    This program counts the distribution of the hour of the day for each of 
    the messages. You can pull the hour from the "From " line by finding the 
    time string and then splitting that string into parts using the colon 
    character. Once you have accumulated the counts for each hour, print out
    the counts, one per line, sorted by hour as shown below.
    
    python timeofday.py
    Enter a file name: mbox-short.txt
    04 3
    06 1
    07 1
    09 2
    10 3
    11 6
    14 1
    15 2
    16 4
    17 2
    18 1
    19 1
    """
    # first, set conditional to open file and shortcut if file is 'mbox-short.txt'
    name = input("Enter file:")
    if len(name) < 1:
        name = "mbox-short.txt"
    handle = open(name)
    # second, set empty dictionary to store hours of day 
    hours_dict = {}
    # pull hour from each day and use .get() idiom
    for line in handle:
        if line.startswith("From "):
            time = line.split()[5]
            hour = time.split(":")[0]
            hours_dict[hour] = hours_dict.get(hour, 0) + 1
            #print(hours_dict)
        tup_lst = list(hours_dict.items()) # create tuple list with .items()
        #print(tup_lst)
    tup_lst.sort()  # sort the list of tuples
    # clean up the print to match exercise by using another for loop
    for l in tup_lst:
        print(l[0], l[1])
    #print(tup_lst)

# exercise_10_2()

## Exercise 3

Write a program that reads a file and prints the letters in decreasing order of frequency. Your program should convert all the input to lower case and only count the letters a-z. Your program should not count spaces, digits, punctuation, or anything other than the letters a-z. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at https://wikipedia.org/wiki/Letter_frequencies.

In [58]:
def ex_10_03():
    """
    Write a program that reads a file and prints the letters in decreasing 
    order of frequency. Your program should convert all the input to lower 
    case and only count the letters a-z. Your program should not count spaces,
    digits, punctuation, or anything other than the letters a-z. Find text 
    samples from several different languages and see how letter frequency 
    varies between languages. Compare your results with the tables at 
    https://wikipedia.org/wiki/Letter_frequencies.
    """
    
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    
    filename = input("Enter file name: ")
    fhand = open(filename, 'r')
    text = fhand.read()
    # print(text)
    
    freq_dict = {}
    for ch in text.lower():
        if ch in alphabet:
            freq_dict[ch] = freq_dict.get(ch, 0) + 1
            
    # now to order it using tuples
    lst = []
    for key,value in freq_dict.items():
        lst.append((value,key)) # set as tuple
    lst.sort(reverse=True)
    
    # print better
    for freq, letter in lst:
        print(letter, freq)
        
ex_10_03()

Enter file name: mbox-short.txt
e 5436
a 5223
i 4494
o 4174
r 4064
t 4050
s 3738
u 3123
c 3088
n 2575
p 2497
m 2436
d 2004
l 1832
h 1392
f 1257
k 1167
b 1134
g 1027
v 997
j 959
y 643
w 586
x 482
z 78
q 57
