# Collections: List, tuples and sets<a href="https://colab.research.google.com/github/milocortes/python_course_summer_school_DMDU_2022/blob/main/notebooks/collections_dmdu_2022.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![python-ecosystem](https://github.com/milocortes/python_course_summer_school_DMDU_2022/raw/main/notebooks/images/data_structures.jpg)
<div style="text-align: center"> https://www.youtube.com/watch?v=B31LgI4Y4DQ </div>





## Data Structures
A data structure is a form of organization, administration and storage that allows efficient access and modification.

One of the most important things in writing efficient programs is understanding the guarantees of the data structures you use. 

A large part of performant programming is knowing what questions you are trying to ask of your data and picking a data structure that can answer these questions quickly.

## Lists and tuples

List and tuples are a class of data structures called *arrays*.

An array is a flat list of data with some instrinsic ordering. 

![array.png](attachment:array.png)


Usually in these sorts of data structures, the relative ordering of the elements is as important as the elements themselves.

*a priori* knowledge of the ordering is incredibly valuable: by knowing that data in our array is at a specific position, we can retrieve it in $O(1)$.


## Lists and tuples

In python we have two types of arrays: lists and tuples.

* **List** are dynamic arrays that let us modify and resize the data we are storing.
* **Tuples** are static arrays whose content are fixed and inmutable.



## Lists and tuples

System memory on a computer can be thought of as a series of numbered buckets, each capable of holding a number.

Python stores data in these buckets *by reference*, which means the number itself simply point to, or refers to, the data we actually care about.

As a result, these buckets can store any type of data we want.

## Lists and tuples 

When we want to create an array (and thus a list or a tuple), we first have to allocate a block of system memory (where every section of this block will be used as an integersized pointer to actual data).

This involves going to the system kernel and requesting the use of $\texttt{N}$ consecutive buckets.

![system_memory.png](attachment:system_memory.png)




## Lists

You create a list by enclosing a comma-separated list of elements in square brackets, like so:

In [1]:
# This assign a three-element list to x
x = [1, 2, 3]
x

[1, 2, 3]

Python list can contain different types of elements; a list element can be any Python object.

In [2]:
# First element is a number, second is a string, third is another list
x = [2, "two", [1, 2, 3]]
x

[2, 'two', [1, 2, 3]]

The most basic built-in list function is the $\texttt{len}$ function, which returns the number of element in a list:

In [3]:
len(x)

3

## List indices

Python starts counting from 0; asking for element 0 returns the first element of the list, asking for element 1 returns the second element, and so forth.


In [4]:
x = ["first", "second", "third", "fourth"]
x[0], x[1]

('first', 'second')

If indices are negative numbers, they indicate positions counting from the end of the list, with -1 being the last position in the list, -2 being the second-to-last position, and so forth.

In [5]:
a = x[-1]
b = x[-2]
a,b

('fourth', 'third')

## List slicing

Python can extract or assign to an entire sublist at once-an operation known as *slicing*. Instead of entering <code>list[index]</code> to extract the item just after index, enter <code>list[index1:index2]</code> to extract all items including <code> index1</code> and up to (but not including) <code> index2</code> into a new list.

In [55]:
x = ["first", "second", "third", "fourth"]
x[1:-1]

['second', 'third']

In [7]:
x[0:3]

['first', 'second', 'third']

In [57]:
x[-2:-1]

['third']

It's also posible to leave out <code>index1</code> or <code>index2</code>. Leaving out <code>index1</code> means "Go from the beginning of the list" and leaving out <code>index2</code> means "Go to the end of the list": 

In [9]:
x[:3]

['first', 'second', 'third']

In [10]:
x[2:]

['third', 'fourth']

Omitting both indices make a new list that goes from the beginning to the end of the original list-that is, copies the list.

In [11]:
y = x[:]
y

['first', 'second', 'third', 'fourth']

## Modifying lists

You can use list index notation to modify a list as well as to extract an element from it. Put the index on the left side of de assignment operator :

In [12]:
x = [1, 2, 3, 4, 5]
x[1] = "two"
x

[1, 'two', 3, 4, 5]

Slice notation can be use here too. Typing someting like <code>lista[index1:index2] = listb</code> cause all elements of <code>lista</code> between <code>index1</code> and <code>index1</code> to be replaced by the elements in <code>listb</code>.

<code>listb</code> can have more or fewer elements than are removed from <code>lista</code>, in which case the length of <code>lista</code> is altered.



In [13]:
x = [1, 2, 3, 4]
x[len(x):] = [5, 6, 7]   # Append list to end of list
x

[1, 2, 3, 4, 5, 6, 7]

In [14]:
x[:0] = [-1, 0]     # Append list to front of list
x

[-1, 0, 1, 2, 3, 4, 5, 6, 7]

In [15]:
x[1:-1] = []    # Removes elements from list
x

[-1, 7]

Appending a single element to list is such a common operation that there's a special <code>append</code> method (associated function) for it

In [16]:
x = [1, 2, 3]
x.append("four")
x

[1, 2, 3, 'four']

One problem can occur if you try to append one list to another. The list gets appended as a single element of the main list:

In [17]:
x = [1, 2, 3, 4]
y = [5, 6, 7]
x.append(y)
x

[1, 2, 3, 4, [5, 6, 7]]

The <code> extend</code> method is like the <code> append</code> method except that it allows you to add one list to another:


In [18]:
x = [1, 2, 3, 4]
y = [5, 6, 7]
x.extend(y)
x

[1, 2, 3, 4, 5, 6, 7]

There's also a special <code>insert</code> method to insert new list element between two existing elements of at the front of the list. <code>insert</code> is used as a method of lists and takes two additional arguments. The first additional argument is the index position in the list where the new element should be inserted, and the second is the new element itself:

In [19]:
x = [1, 2, 3]
x.insert(2,"hello")
x

[1, 2, 'hello', 3]

In [20]:
x.insert(0,"start")
x

['start', 1, 2, 'hello', 3]

It's easiest to think of <code>list.insert(n,element)</code> as meaning <code>insert</code> <code>elem</code> just before the *n*th element of list.

Anything that can be done with <code>insert</code> can also be done with slice assignment.

The <code>del</code> statement is the preferred method of deleting list items of slices. It doesn't do anything that can't be done with slice assignment but it's usually easier to remenber and easier to read:

In [21]:
x = ["a", 2, "c", 7, 9, 11]
del x[1]
x

['a', 'c', 7, 9, 11]

In [22]:
del x[:2]
x

[7, 9, 11]

The <code>remove</code> method isn't the converse of <code>insert</code>. Whereas <code>insert</code> inserts an element at a specified location, <code>remove</code> looks for the first instance of a given value in a list and removes that value from the list:

In [23]:
x = [1, 2, 3, 4, 3, 5]
x.remove(3)
x

[1, 2, 4, 3, 5]

In [24]:
x.remove(3)
x

[1, 2, 4, 5]

In [25]:
x.remove(3)
x

ValueError: list.remove(x): x not in list

If <code>remove</code> can't find anything to remove, it raises an error.

The <code>reverse</code> method is a more specialized list modification method. It efficiently reverse a list in place:

In [None]:
x = [1, 3, 5, 6, 7]
x.reverse()
x

## Sorting lists

Lists can be sorted by using the built-in Python <code>sort</code> method:

In [None]:
x = [3, 8, 4, 0, 2, 1]
x.sort()
x

Sorting works with strings, too

In [None]:
x = ["life", "Is", "Enchanting"]
x.sort()
x

The <code>sort</code> method can sort just about anything because Python can compare just about anything. But there's one caveat in sorting: The default key method used by <code>sort</code> requires all items in the list to be of comparable types.

In [None]:
x = [1, 2, "hello", 3]
x.sort()

## Other common list operations
### List membership with the <code>in</code> operator

It's easy to test whether a value is in a list by using the <code>in</code> operator which returns a Boolean value. You can also use the converse, the <code>not in</code>  operator:

In [None]:
3 in [1, 3, 4, 5]

In [None]:
3 not in [1, 3, 4, 5]

In [None]:
3 in ["one", "two", "three"]

In [None]:
3 not in ["one", "two", "three"]

# List concatenation with the <code>+</code> operator

To create a list by concatenating two existing lists, use the <code>+</code> (list concatenation) operator, which leaves the argument lists unchanged:


In [None]:
z = [1, 2, 3] + [4, 5]
z

# List initialization with the <code>*</code> operator

Use the <code>*</code> operator to produce a list of a given size, which is initialized to a given value. This operation is a common one for working with large lists whose size is known ahead of time.

In [None]:
z = [10] * 4
z

## List minimum of maximum with <code>min</code> and <code>max</code>

You can use <code>min</code> and <code>max</code> to find the smallest and largest elements in a list. You'will probably use <code>min</code> and <code>max</code> mostly with numerical lists, but you can use them with lists containing any type of element.


In [None]:
min([3, 7, 0, -2, 11])

In [None]:
max([4,"Hello", [1,2]])

### List matches with <code>count</code> 

<code>count</code>  also searches through a list, looking for a given value, but it returns the number of times thah the value is found in the list rather than positional information:


In [None]:
x = [1, 2, 2, 3, 5, 2, 5]
x.count(2)

In [None]:
x.count(5)

In [None]:
x.count(4)

## Summary of list operations
<img src="images/list_operations.png" alt="Drawing" class="center" style="width: 800px; "/>


# Tuples

*Tuples* are data structures that are very similar to lists, but they can't be modified; they can only be created.The tuples have important roles that can't be efficiently filled by lists, such as keys for dictionaries.

## Tuples basics

Creating a tuple is similar to creating a list: assign a sequence of values to a variable. A list is a sequence that's enclosed by [and]; a tuple is a sequence that's enclosed by ():

In [26]:
x = ("a", "b", "c", "d")
x

('a', 'b', 'c', 'd')

After a tuple is created, using it is so much like using a list that it's easy to forget that tuples and lists are different data types:

In [None]:
x[2]

In [None]:
x[1:]

In [None]:
len(x)

In [None]:
max(x)

In [None]:
min(x)

In [None]:
5 in x

In [None]:
5 not in x

The main difference between tuples and lists is that tuples are immutable. An attempt to modify a tuple results in a confusing error message, which is Python's way of saying that it doesn't know how to set an item in a tuple:

In [27]:
x[2] = "d"

TypeError: 'tuple' object does not support item assignment

You can create tuples from existing ones by using the <code>+</code> and <code>*</code> operators:

In [28]:
x+x

('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd')

In [29]:
2*x

('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd')

A copy of a tuple can be made in any of the same ways as for lists:

In [30]:
z = x[:]
z

('a', 'b', 'c', 'd')

In the case of one-element tuples, Python requires that the element in the tuple be followed by a comma. In the case of zero-element (empty) tuples , there's no problem.

In [32]:
x = 3
y = 5
( x + y ) # this line adds x and y

8

In [33]:
( x + y, ) # Including a comma indicates that the parentheses denote a tuple.

(8,)

In [34]:
( ) # To create an empty tuple, use an empty pair of  parentheses 

()

## Packing and unpacking tuples

Python permits tuples to appear on the left side of an assignment operator, in which case variables in the tuple receive the corresponding values from the tuple on the right side of de assignment operator:


In [35]:
(one, two, three, four) = (1, 2, 3, 4)
one

1

In [36]:
two

2

This example can be written even more simply, because Python recognizes tuples in an assignment context even without the enclosing parentheses:

In [None]:
one, two, three, four = 1, 2, 3, 4

One line of code has replaced the following four lines of code

In [38]:
one = 1
two = 2
three = 3
four = 4

Packing and unpacking can also be performed by using list delimiters:

In [39]:
one, two, three, four = [1, 2, 3, 4]
one

1

In [40]:
two

2

## Converting between lists and tuples

Tuples can be easily converted to lists with the <code>list</code> function, which takes any sequence as an argument and produces a new list  with the same elements as the original sequence. 

Similarly, lists can be converted to tuples with the <code>tuple</code> function, which does the same thing but produces a new tuple instead a new list:

In [41]:
list((1, 2, 3, 4))

[1, 2, 3, 4]

In [42]:
tuple([1, 2, 3, 4])

(1, 2, 3, 4)

# Sets
A *set* in Python is an unordered collection of objects used when membership and uniqueness in the se ar main things you need to know about that objetct. 

Like dictionary keys, the items in a set must be immutable and hashable. 

This means that ints, floats, strings, and tuples can be members of a set, but lists, dictionaries, and sets themselves can't.

## Set operations

In addition to the operations that apply to collections in general, such as <code>in</code>, <code>len</code>, and iteration in <code>for</code> loops, sets have several set-specific operations:

In [43]:
x = set([1, 2, 3, 1, 3, 5])
x

{1, 2, 3, 5}

In [44]:
x.add(6)
x

{1, 2, 3, 5, 6}

In [45]:
x.remove(5)
x

{1, 2, 3, 6}

In [54]:
y  = set([1, 7, 8, 9])
x | y  # get the union, or combination, of  sets

{1, 2, 3, 6, 7, 8, 9}

In [52]:
x & y # get the intersection 

{1}

In [53]:
x ^ y # find their symmetric difference- that is, elements that are in one set or the other but no both

{2, 3, 6, 7, 8, 9}