# Tuples & Sets

This material is adapted from portions of Chapters 10 and 11 of [*Think Python*, 3rd edition](https://greenteapress.com/wp/think-python-3rd-edition), by Allen B. Downey.

In this notebook you will learn to:

- Create and use tuples as immutable sequences
- Understand when to use tuples vs. lists
- Return multiple values from functions using tuples
- Unpack sequences into individual variables
- Use sets for deduplication and membership testing
- Recognize how type characteristics determine what an object can do

## Tuples

Tuples (commonly pronounced "too-ple" in the context of programming) are another commonly used object type that is included in base Python. Like lists, they are ordered heterogeneous sequences. Unlike lists, they are immutable.

### Creating Tuples

To create a tuple, you *can* write a comma-separated list of values.

In [None]:
t = 'a', 'u', 'b', 'i', 'e'
type(t)

Although it is *not always necessary*, it is common to enclose tuples in parentheses.

In [None]:
t = ('a', 'u', 'b', 'i', 'e')
type(t)

Python will always *display* tuples in parentheses, regardless of the manner in which they are created.

In [None]:
t = 'a', 'u', 'b', 'i', 'e'
print(t)

To create a tuple with a single element, you *must* include a final comma.

In [None]:
t1 = 'p',
type(t1)

Another way to create a tuple is the built-in function `tuple`. If the argument is a sequence, the result is a tuple with its elements.

In [None]:
t = tuple('aubie')
print(t)  # ('a', 'u', 'b', 'i', 'e')

If required, an empty tuple can be created with either empty parentheses `()` or `tuple()`.

In [None]:
t0_p = ()
print(len(t0_p), type(t0_p))

t0_f = tuple()
print(len(t0_f), type(t0_f))

Note from [the Python documentation](https://docs.python.org/3/library/stdtypes.html#tuples):

*...it is actually the comma which makes a tuple, not the parentheses. The parentheses are optional, **except in the empty tuple case, or when they are needed to avoid syntactic ambiguity**. For example, `f(a, b, c)` is a function call with three arguments, while `f((a, b, c))` is a function call with a 3-tuple as the sole argument.*

In [None]:
len(1, 2, 3)  # TypeError, interpreted as three arguments

In [None]:
len((1, 2, 3))  # use inner parens to "avoid syntactic ambiguity"

### Common Operations

Since tuples are sequences, all relevant operations work as they do for lists.
Since tuples are immutable the original object is unchanged.

In [None]:
# indexing returns the object at that index
print(t[0])  # 'a'

# slicing returns a new tuple
print(t[::-1])  # ('e', 'i', 'b', 'u', 'a')

# concatenation and repetition work as expected
print(tuple("go ") + t)  # ('g', 'o', ' ', 'a', 'u', 'b', 'i', 'e')
print(t[1:2] * 3)        # ('u', 'u', 'u')

# comparison and membership
print(t1 < t)       # False
print('a' in t)     # True
print('a' not in t) # False

# len, min, max
print(len(t))  # 5
print(min(t))  # 'a'
print(max(t))  # 'u'

### Important Methods

Because they are immutable, tuples only support a limited set of methods.

In [None]:
t2 = tuple("terminator")

# count returns the number of instances of "t" in t2
print(t2.count("t"))  # 2

# index returns the index number of the first instance of "i" in t2
print(t2.index("i"))  # 4

Conversely, since tuples are immutable, they can not be modified by item assignment.

In [None]:
t2[0] = 'T'  # TypeError

Also, tuples don't have any of the methods that modify lists, like `append` and `remove`.

In [None]:
t.remove('l')  # AttributeError

An *attribute* is a variable or method associated with an object - this error message means that tuples don't have a method named `remove`.

### Exercise - Return, Assign, and Unpack a Tuple

Write a function, `rectangle_properties` that takes a length and width and calculates the area, perimeter, and aspect ratio (the length divided by width). Return all three values as a tuple in the same order. Assign them to a variable called `result`. Use indexing and an f-string to print the result as shown below. Test your results.

Using a length of 2 and width of 3, your output should look like this:

```text
Area: 6.0, Perimeter: 10.0, Aspect Ratio: 0.67
```

In [None]:
# code here...

In [None]:
# this code block should run without errors
assert rectangle_properties(2, 3) == (6, 10, 0.6666666666666666)
assert rectangle_properties(4, 2) == (8, 12, 2.0)

#### Solution / Discussion

In [None]:
def rectangle_properties(length, width):
    area = length * width
    perimeter = (2 * length) + (2 * width)
    aspect = length / width
    return (area, perimeter, aspect)

result = rectangle_properties(2, 3)
a = result[0]
p = result[1]
r = result[2]
print(f"Area: {a:0.1f}, Perimeter: {p:0.1f}, Aspect Ratio: {r:0.2f}")

Note that Python allows you to directly assign the components of a sequence as follows:

In [None]:
a, p, r = rectangle_properties(2, 3)
print(a, p, r)

This concise method of assignment is commonly used and is called *unpacking*. Each element of the returned value is assigned to the comma separated list of variable names, in the corresponding order. For more information see the section below entitled *Sequence Unpacking*.

## More About Types

So far we've used Python's `int`, `float`, `string`, `list`, and `tuple` types.
We've also covered a few special case types like `bool` (for `True` and `False` values) and `NoneType` (for `None` values).
Together, they are foundational to Python programming. Recall that:

- Everything in Python is an object
- Each object has a type, value, and identity
- The type of an object determines its capabilities

We've seen that `ints` and `floats` can use the `+`, `-`, `*`, and `/` operators, with the expected results. `strings` can also use `+` and `*`, for concatenation and repetition, but not `-` or `/`.

In [None]:
print(42 + 3.14)  # int plus float
print(42 / 3.14)  # int divided by float

phrase = "repeat" + " this"  # string concatenation
print(phrase * 4)  # string repetition
print(phrase / 2)  # error, divide operator not supported for string type

This is one example of the claim that "the type of an object determines its capabilities." Another example is the differing methods and functions supported by each type.

We've seen that the `len` function works on all sequences, but not numerics, and the `append` method works on lists, but not strings or tuples.

In [None]:
# len works on lists, tuples, and strings
print(len([1, 2, 3]))           # 3
print(len(('1', 'b', '3.0')))   # 3; note - inner parentheses required
print(len("how long is this"))  # 16

# but not numerics
print(len(42))  # TypeError!

In [None]:
l = ["lists", "are", "..."]
l.append("mutable")
print(l)

s = "strings are ... "
s.append("immutable")  # AttributeError

All these rules may seem arbitrary and disconnected. Let's try to make some sense of them.

The differences depend on the nature of the types in question.
Characteristics like ordered / unordered, mutable / immutable, and homogeneous / heterogeneous are fundamental differentiators between types. Learning how those characteristics are associated with each object type makes it easier to reason out what they can do and how they work. It is also essential to being able to debug the most common errors made by those learning Python.

The following table summarizes the key characteristics of the sequence types we've covered:

| Type   | Ordered | Mutable | Heterogeneous | Notes                                    |
|:-------|:--------|:--------|:--------------|:-----------------------------------------|
| List   | Yes     | Yes     | Yes           | Common for collections, often homogeneous |
| Tuple  | Yes     | No      | Yes           | Used for fixed-size records or return values |
| String | Yes     | No      | No            | Immutable sequence of characters         |

Ordered object types can be indexed, sliced, and have a length. These capabilities are not relevant to numerics, which only represent a single value.

In [None]:
len(42)  # TypeError

The elements of mutable sequences can be changed after creation, which allows for in-place modification (aka mutation) by index assignment or applicable methods.

In [None]:
l[-1] = "useful"  # index assignment
print(l)

l.insert(3, "very")  # insert method modifies in place (mutates)
print(l)

Immutable sequences cannot be modified after creation, so all related manipulations create new objects and reassign them.

In [None]:
# concatenation creates a new string object
before = id(s)  # get the unique ID of s before the modification
s = s + "useful"
print(s)

after = id(s)
print(before == after)  # False - new object was created

In [None]:
# string methods create new objects
s.upper()  # string methods do not change in place (immutable)
print(s)

s = s.upper()  # must be reassigned
print(s)

We'll add to this table as we introduce sets and dictionaries, starting now...

## Sets

We've talked a lot about collections of ordered elements called sequences. This begs the question - do unordered collections exist in Python? Yes! To introduce this concept we'll briefly cover the basics of the `set` object type before moving on to its big brother, the dictionary, in the next notebook.

Sets are **unordered** containers of **unique**, **immutable** elements. Duplicate and/or mutable elements are not allowed. Sets are themselves mutable and heterogeneous.

Sets are a special purpose object type. They offer operators and methods that mimic the mathematical operations from [set theory](https://en.wikipedia.org/wiki/Set_theory). If that doesn't ring a bell, think [Venn diagrams](https://en.wikipedia.org/wiki/Venn_diagram), union, intersection, difference, and the like.

For our purposes they are most useful conceptually as an introduction to unordered collections and practically for removing duplicates from a sequence (aka deduplication). For more details on the operations and methods supported by sets, including those that implement set theory, see Real Python's [Sets in Python](https://realpython.com/python-sets/) article.

### Creating Sets

The `set` function can be used to create a set object from any sequence, like a tuple as shown here:

In [None]:
vals = 'apple', 'banana', 'cherry'
set(vals)

Like lists use square brackets, a set can also be created with curly brackets. Just as with lists, the brackets must contain a comma separated list of the desired elements:

In [None]:
{'apple', 'banana', 'cherry'}

If a sequence is used instead, this method will result in a single element set containing the original sequence, which is probably not the intended effect:

In [None]:
{vals}

Python always displays a set surrounded by curly brackets, except for empty sets, which can **only** be created with `set`, and are displayed as `set()`:

In [None]:
# create an empty set
set()

We'll soon see that empty curly brackets (`{}`) denote an empty object of another type, so cannot be used for sets.

As a final reminder related to creating sets, though mutable themselves, sets cannot contain mutable objects like lists, or other sets.

In [None]:
{[1, 2, 3], [4, 5, 6]}  # TypeError

So far, we've created sets of unique values. When creating or modifying sets, any duplicated values are automatically eliminated:

In [None]:
uniques = set('banana')
print(uniques)

In [None]:
uniques.add('a')
print(uniques)

The resulting set consists only of the unique elements in the sequence and the order is **not** preserved. It is easy to imagine situations where this might be valuable, if only for deduplication.

Here we demonstrate removing duplicates from a list by converting it to a set and back:

In [None]:
letters = list('banana')
unique_letters_set = set('banana')
unique_letters_list = list(unique_letters_set)
print(letters, unique_letters_set, unique_letters_list, sep='\n')

This method can be condensed into just `list(set(seq))`, where `seq` is any object of sequence type.

### Exercise - Check for Duplicates

Use what you've learned about sets to write a function, `has_duplicates`, that takes a sequence and returns `True` if it contains duplicate elements, or `False` if not.

Then write a function, `test_sequence` that takes a sequence and uses `has_duplicates` to test it. For the tests to pass, `test_sequence` must **return** the results as shown below, not print them.

For the test value `'banana'`, `test_sequence` should return:

```text
The sequence 'banana' has duplicates.
```

For the test value `[1, 2, 3]`, `test_sequence` should return:

```text
The sequence '[1, 2, 3]' does not have duplicates.
```

In [None]:
# first write has_duplicates
# can you do it in 3 lines or less?


# then write test_sequence using has_duplicates

After writing the function definitions above, the following tests should pass.

In [None]:
assert test_sequence('banana') == "The sequence 'banana' has duplicates."
assert test_sequence([1, 2, 3]) == "The sequence '[1, 2, 3]' does not have duplicates."

#### Solution / Discussion

In [None]:
def has_duplicates(seq):
    '''compare the length of the deduplicated and original sequence'''
    return len(set(seq)) != len(seq)

def test_sequence(seq):
    '''evaluate the result of has_duplicates and return a message'''
    if has_duplicates(seq):
        return f"The sequence '{seq}' has duplicates."
    else:
        return f"The sequence '{seq}' does not have duplicates."

# print the results
print(test_sequence('banana'))
print(test_sequence([1, 2, 3]))

There are two things worth discussing in this solution.

First, `has_duplicates` directly returns the `bool` result of the inequality comparison. The name of the function suggests it will return `True` or `False`. This approach is also seen in Python methods like `isalpha`, the string method that answers the question "does the string contain only alphabetic characters (i.e., a-z)?" with a `bool` that signifies yes or no.

Second, the comparison in `test_sequence` directly evaluates the value returned by `has_duplicates` with the statement `if has_duplicates(seq)`. It is not necessary to compare this return value with `True` or `False`, i.e. `if has_duplicates(seq) == True`.

These stylistic decisions are in line with Python best practices, but are not required. They are noted here for clarity.

### Updated Type Table

Now that we've covered sets, let's update the type table from earlier to include this new type.

| Type       | Category | Ordered | Mutable | Heterogeneous | Notes                                       |
|:-----------|:---------|:--------|:--------|:--------------|:--------------------------------------------|
| List       | Sequence | Yes     | Yes     | Yes           | Common for collections, often homogeneous    |
| Tuple      | Sequence | Yes     | No      | Yes           | Used for fixed-size records or return values |
| String     | Sequence | Yes     | No      | No            | Immutable sequence of characters             |
| Set        | Collection | No    | Yes     | Yes           | Unordered, no duplicates allowed             |

Sets don't fit neatly into the "Sequence" category because they are unordered. We'll revisit this table once more when we cover dictionaries in the next lecture.

## Best Practices

### Tuple or List?

If tuples are essentially immutable lists, why bother? The mutability of lists makes them more flexible, as evidenced by the wealth of available methods. Why use a less capable object type?

In fact, the simplicity of tuples can be an advantage. Their immutability prevents unintended changes, which is a common source of bugs.

Practically speaking, tuples are commonly used to represent a fixed collection of related values, like coordinates `(x, y)`. In many cases, the related values are of different types, and the structure / order of the values is important. For example, student data might be represented as `(name, id, grades)`, where `name` is a string, `id` is an integer, and `grades` is a sequence (list or tuple) of floating point values.

In [None]:
student = ("Aubie", 8675309, (95.8, 100.0, 91.1))

This kind of data structure is sometimes called a *record*. Because it is meant to represent a single "entity" or "item" with multiple attributes, it is important that it remains intact. The immutability and heterogeneity of tuples align well with this.

Lists, on the other hand, are commonly used for sequences of homogeneous data, where the order of the values itself does not carry important information about the object. This usage aligns with lists being mutable, allowing for easy addition, removal, or modification of elements.

In [None]:
fruits = ['apple', 'banana', 'mango', 'broccoli', 'strawberry']
fruits.remove('broccoli')
print(fruits)

In summary, lists in Python are generally used when you need to collect data that will change or grow dynamically. This leverages their mutability and assumes order-independence. Tuples are typically used for fixed-size collections of heterogeneous data, where order must be preserved and dynamic resizing is undesirable.

These conventions are better taken as general guidance than a strict rule. Heterogeneous lists and homogeneous tuples both have their places, but the recommendations above reflect *idiomatic* Python. Idiomatic is a term used to describe methods that align with the design and philosophy of the language; they are "the way it should be done".

To learn more about the design and philosophy of Python, see [The Zen of Python](https://en.wikipedia.org/wiki/Zen_of_Python).

### Prefer Tuples for Multiple Return Values

As we saw in the Functions II notebook, a function can only return one object. To return more than one value, use a container object type. While we previously demonstrated this with a list, it is much more common in Python to use tuples for this purpose.

Revisiting the example we used before, but using a tuple return value:

In [None]:
def divide_with_remainder(dividend, divisor):
    quotient = dividend // divisor
    remainder = dividend % divisor
    return (quotient, remainder)  # return both values in a tuple

In [None]:
result = divide_with_remainder(17, 5)
print(result)  # (3, 2)

This recommendation follows the best practice of using tuples as immutable records. As with any recommendation, depending on the nature of the function, other types may be better suited.

### Sequence Unpacking

Tuple return values are frequently used in combination with *unpacking* - assigning each element of a sequence to its own variable in a single statement.

Here is an example of a function that returns a tuple.

In [None]:
def min_max(t):
    return min(t), max(t)

`max` and `min` are built-in functions that find the largest and smallest elements of a sequence.
`min_max` computes both and returns a tuple of two values.

We can assign the results to individual variables like this:

In [None]:
low, high = min_max([2, 4, 1, 3])
print(low, high)

The number of variables on the left must match the number of elements in the sequence on the right. If they don't, Python raises a `ValueError`:

In [None]:
a, b, c = min_max([2, 4, 1, 3])  # ValueError: 3 variables, but only 2 values

Unpacking works with any sequence type, not just tuples:

In [None]:
# unpack a list
first, second, third = [10, 20, 30]
print(first, second, third)

# unpack a string
a, b, c = "abc"
print(a, b, c)

This also makes it easy to swap values without a temporary variable:

In [None]:
x = 1
y = 2
x, y = y, x  # swap!
print(x, y)   # 2 1

Python evaluates the entire right side before assigning, so `y, x` creates the tuple `(2, 1)` first, then unpacks it into `x` and `y`.

## Common Gotchas

### Mutation and Assignment Rarely Mix

There are two ways to change a value in Python: in-place modification (mutation) and assignment. When working with mutable object types, mixing the two approaches is often a source of woe for newcomers.

In [None]:
l = ["this", "is", "..."]

# don't combine mutation and assignment
l = l.append("tricky!")
print(l)  # None!

Methods are just functions associated with particular object types. Those that use in-place modification rarely have a separate result, so most, including `append`, return the default `None`.

The only cure for this is to know if the type is mutable and how the method you are using works. Immutable object types can only be changed through reassignment. For mutable types, the `pop` method is one of the few that both mutates and returns a value - it returns a value that it removes.

### The Single-Element Tuple

Creating a tuple with one element requires a trailing comma. Without it, Python interprets the parentheses as grouping, not as tuple construction:

In [None]:
not_a_tuple = ("hello")
print(type(not_a_tuple))  # <class 'str'>

actually_a_tuple = ("hello",)
print(type(actually_a_tuple))  # <class 'tuple'>

This is easy to overlook and can cause subtle bugs. If a function expects a tuple but receives a string, iterating over it will yield individual characters instead of the single element you intended.

In [None]:
for item in ("hello",):
    print(item)   # prints: hello

for item in ("hello"):
    print(item)   # prints: h e l l o (one per line!)

### Empty Curly Brackets

Because `{}` creates an empty *dictionary* (not an empty set), use `set()` when you need an empty set:

In [None]:
mystery = {}
print(type(mystery))  # <class 'dict'>

empty_set = set()
print(type(empty_set))  # <class 'set'>

## Glossary

**tuple:**
An immutable, ordered sequence of values. Created with commas (optionally enclosed in parentheses). Commonly used for fixed-size records and function return values.

**set:**
An unordered, mutable collection of unique, immutable elements. Created with `set()` or curly brackets `{}`. Useful for deduplication and membership testing.

**record:**
A fixed collection of related values representing a single entity, often of different types. Tuples are commonly used as records in Python.

**unpacking:**
Assigning the elements of a sequence to individual variables in a single statement, e.g., `a, b = (1, 2)`.

**deduplication:**
The process of removing duplicate values from a collection. Converting a sequence to a `set` and back is a common technique.

**idiomatic:**
Code that follows the conventions and design philosophy of the language. Idiomatic Python favors tuples for records and lists for dynamic collections.

**attribute:**
A variable or method associated with an object, accessed with dot notation (e.g., `t.count("a")`).

## Problems

**★ 1. Coordinate Distance**

Write a function `distance(p1, p2)` that takes two coordinate tuples of the form `(x, y)` and returns the Euclidean distance between them. Use `math.sqrt` and tuple indexing.

The formula is: $d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$

In [None]:
import math

# your code here

In [None]:
assert distance((0, 0), (3, 4)) == 5.0
assert distance((1, 1), (1, 1)) == 0.0
assert abs(distance((0, 0), (1, 1)) - math.sqrt(2)) < 1e-9
print("All tests passed!")

**★★ 2. Min, Max, and Mean**

Write a function `summarize(numbers)` that takes a list of numbers and returns a tuple of three values: the minimum, maximum, and mean (average). Do not use `min` or `max` - compute all three values yourself using a single loop.

In [None]:
# your code here

In [None]:
assert summarize([10, 20, 30]) == (10, 30, 20.0)
assert summarize([5]) == (5, 5, 5.0)
assert summarize([-3, 0, 3, 6]) == (-3, 6, 1.5)
print("All tests passed!")

**★★ 3. Unique Words**

Write a function `count_unique_words(text)` that takes a string and returns the number of unique words, ignoring case. Words are separated by spaces.

For example, `count_unique_words("the cat and the dog")` should return `4` because "the" appears twice.

In [None]:
# your code here

In [None]:
assert count_unique_words("the cat and the dog") == 4
assert count_unique_words("Go go GO") == 1
assert count_unique_words("each word is unique here") == 5
assert count_unique_words("") == 0
print("All tests passed!")

**★★ 4. Honor Roll**

You are given a list of student records, where each record is a tuple of the form `(name, score)`. Write a function `honor_roll(students, threshold=90)` that returns a sorted list of the names of students whose score meets or exceeds the threshold.

In [None]:
# your code here

In [None]:
roster = [
    ("Aubie", 95),
    ("Al", 72),
    ("Nova", 91),
    ("Pat", 88),
    ("Kit", 100),
]

assert honor_roll(roster) == ["Aubie", "Kit", "Nova"]
assert honor_roll(roster, threshold=95) == ["Aubie", "Kit"]
assert honor_roll(roster, threshold=101) == []
print("All tests passed!")

**★★★ 5. Shipment Report**

A warehouse receives shipment records as a list of tuples. Each tuple has the form `(item, category, quantity)`. The same item may appear in multiple shipments.

Write a function `shipment_report(shipments)` that returns a tuple of three values:

1. The number of unique items
2. A sorted list of all categories
3. The total quantity across all shipments

In [None]:
# your code here

In [None]:
data = [
    ("Gear", "Mechanical", 50),
    ("Bearing", "Mechanical", 100),
    ("Resistor", "Electrical", 200),
    ("Gear", "Mechanical", 30),
    ("Capacitor", "Electrical", 150),
    ("Bearing", "Mechanical", 25),
]

unique_items, categories, total_qty = shipment_report(data)
assert unique_items == 4
assert categories == ["Electrical", "Mechanical"]
assert total_qty == 555
print("All tests passed!")

**★★ 6. Fix This Code**

The following code has four bugs related to tuples and sets. Find and fix all of them. The corrected code should print the output shown below without errors.

```text
First: Aubie, Last: Tiger
Points: ((0, 0),), type: <class 'tuple'>
Visited: {(0, 0)}, type: <class 'set'>
```

In [None]:
# Bug 1: create a tuple containing a single coordinate pair
origin = (0, 0)
points = (origin)  # should be a tuple containing origin

# Bug 2: create an empty set to track visited points
visited = {}

# Bug 3: add origin to visited
visited = visited.add(origin)

# Bug 4: a function that returns two values
def split_name(full_name):
    parts = full_name.split()
    first = parts[0]
    last = parts[1]
    return first
    return last

first, last = split_name("Aubie Tiger")
print(f"First: {first}, Last: {last}")
print(f"Points: {points}, type: {type(points)}")
print(f"Visited: {visited}, type: {type(visited)}")

In [None]:
# your corrected code here

---

Auburn University / Industrial and Systems Engineering
INSY 3010 / Programming and Databases for ISE
© Copyright Danny J. O'Leary.

This material is adapted from [*Think Python*, 3rd edition](https://greenteapress.com/wp/think-python-3rd-edition), by Allen B. Downey. For licensing, attribution, and information: [GitHub INSY3010](https://github.com/olearydj/INSY3010)