<img src="./intro_images/introbanner.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Lecturer health data science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.png" width="30%" />
         </td>
     </tr>
</table>

# Data Structures
****

#### About this Notebook
This notebook looks at the four in-built data structures in Python of lists, dictionaries, sets and tuples and how to work with them.

This notebook is at <code>Beginner</code> level and will take approximately 45 minutes to complete.


<div class="alert alert-block alert-warning"><b>Learning Objectives:</b> 
<br/> This notebook will help you start to:
    
- Express a clear understanding of the basic principles of the Python programming language.

</div>

So far we have been using simple variables to store single items of data for our programs. Python supports more advanced data structures for storing and organising data. These include **`lists`**, **`tuples`**, **`sets`** and **`dictionaries`**. We will examine each in turn.

### 1.0 Lists

Lists can contain multiple variables. Below we create a list called **`fruit`** and using square brackets we add three items (in this case strings) to the list separated by commas. We can use the **`len()`** function to see how many items (**`elements`**) are contained in our list.

In [1]:
fruit = ['apple', 'pear', 'banana']

In [2]:
fruit

['apple', 'pear', 'banana']

In [3]:
len(fruit)

3

To access the individual elements (items) of our list, we can use the index of the element. 

In [4]:
fruit[0]

'apple'

In [5]:
fruit[1]

'pear'

<div class="alert alert-success">
<b>Note:</b> In Python the first element of a list is <code>0</code> and **not** 1.
</div>

<img src="./intro_images/list.png" width="500" />

The image above shows another way of looking at our fruit list. You can image a list as a series of connected boxes that can contain some data (in this case the names of some fruit). We can now pass this whole list around like a single variable. This is very useful if you want to store a lot of data (for example all the players in a football team or a list of patients having a knee replacement operation). To access the individual data in the boxes you use the index number (the numbers on the bottom of the boxes in the image above starting at 0).  

An empty list can be defined by using empty brackets:

In [7]:
my_list = []

In [8]:
my_list

[]

In [9]:
len(my_list)

0

Another useful feature of lists is that each element (box) can contain data of different types. This means we could store a string in index 0, a number in element 1 and even another list or other data structure in another element. For example:

In [44]:
my_list = ['some string', 23, 13.3, [1, 2, 5]]
print(my_list)

['some string', 23, 13.3, [1, 2, 5]]


In [45]:
print(my_list[1])

23


In [46]:
print(my_list[3])

[1, 2, 5]


We can use the square brackets again to get to an element in our list within a list (nested list), such as the value 2:

In [47]:
print(my_list[3][1])

2


In this example we get element 3 of **`my_list`** which contains the other list and then element 1 of that list which contains the number 2. This means we can create and store data in more complex structures. 

<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
1. Print the value <code>5</code> contained in element 3 of <code>my_list</code><br>
2. What element is the value 13.3 stored at in <code>my_list</code>?
</div>

In [50]:
print(my_list[3][2])

5


It is stored in element 2

We can add items to the list using the **`append()`** function. In the example below we use this function to add the number 5 to the end of the list.

In [14]:
my_list = [1, 2, 3, 4]
my_list.append(5)
print(my_list)

[1, 2, 3, 4, 5]


In a similar way we can use the **`del`** keyword to remove an item from the list at a specific element. For example to remove the number 2 from the list:

In [15]:
del my_list[1]
print(my_list)

[1, 3, 4, 5]


We can also return parts of a list using the colon (:) operator. The following examples show how a start index on the left of the colon and an end index can be specified. If no value is used then all values are used:

In [16]:
print(my_list[1:3])

[3, 4]


In [17]:
print(my_list[1:])

[3, 4, 5]


In [18]:
print(my_list[:])

[1, 3, 4, 5]


In [19]:
print(my_list[:3])

[1, 3, 4]


In [20]:
print(my_list[-1])

5


We can also use the plus operator to add lists together:

In [21]:
[2, 4, 5] + [2, 5]

[2, 4, 5, 2, 5]

There are many other methods (functions) that can be used with lists. They are used by writing the list name followed by a dot and then the name of the function. Like the **`pop()`** function that removes the last element of a list:

In [22]:
my_list = [1, 2, 3, 4, 5]
print(my_list.pop())

5


Some commonly used list functions:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-5ua9{font-weight:bold;text-align:left}
.tg .tg-s268{text-align:left}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-5ua9">Function</th>
    <th class="tg-5ua9">Description</th>
    <th class="tg-5ua9">Example</th>
  </tr>
  <tr>
    <td class="tg-5ua9">append()</td>
    <td class="tg-s268">Adds an item to the end of a list</td>
    <td class="tg-s268">my_list.append(2)</td>
  </tr>
  <tr>
    <td class="tg-s268">extend()</td>
    <td class="tg-s268">Add multiple items to list</td>
    <td class="tg-s268">my_list.extend([1,2,4])</td>
  </tr>
  <tr>
    <td class="tg-s268">insert()</td>
    <td class="tg-s268">Add an item to list in specific position</td>
    <td class="tg-s268">my_list.insert(0, "Hello")</td>
  </tr>
  <tr>
    <td class="tg-s268">remove()</td>
    <td class="tg-s268">Remove a specified item from a list</td>
    <td class="tg-s268">my_list.remove("Hello")</td>
  </tr>
  <tr>
    <td class="tg-s268">pop()</td>
    <td class="tg-s268">Remove item from a specified position or the last item if not</td>
    <td class="tg-s268">my_list.pop()</td>
  </tr>
  <tr>
    <td class="tg-s268">clear()</td>
    <td class="tg-s268">Remove all items in a list</td>
    <td class="tg-s268">my_list.clear()</td>
  </tr>
  <tr>
    <td class="tg-s268">index()</td>
    <td class="tg-s268">Find the index of an item in a list</td>
    <td class="tg-s268">my_list.index("Hello")</td>
  </tr>
  <tr>
    <td class="tg-s268">count()</td>
    <td class="tg-s268">Number of times an item appears in a list</td>
    <td class="tg-s268">my_list.count("apples")</td>
  </tr>
  <tr>
    <td class="tg-s268">sort()</td>
    <td class="tg-s268">Order items in a list (can use reverse=True) to alter order</td>
    <td class="tg-s268">my_list.sort()</td>
  </tr>
  <tr>
    <td class="tg-s268">reverse()</td>
    <td class="tg-s268">Reverse elements in a list</td>
    <td class="tg-s268">my_list.reverse()</td>
  </tr>
  <tr>
    <td class="tg-0lax">copy()</td>
    <td class="tg-0lax">Make a copy of a list</td>
    <td class="tg-0lax">my_list.copy()</td>
  </tr>
</table>

<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
1. Make a new list with at least 5 string items<br>
2. Use the <code>sort()</code> function to put your list into order<br>
3. Use the <code>pop()</code> function to remove and print the last element of your sorted list
</div>

In [23]:
my_new_list = ['apples', 'oranges', 'bananas', 'pears', 'grapes']
my_new_list.sort()
print(my_new_list.pop())

pears


### 2.0 Dictionaries

Another useful data structure in Python is a **`dictionary`**. Dictionaries allow multiple data items to be labelled and included in a single data structure. Consider the following example that stores some medical information about a patient:

<img src="./intro_images/dr.jpg" width="500" />

In [2]:
med_data = {
    "name": "Mike Smith",
    "dob": "13/12/1979",
    "age": 40,
    "NHS number": 1223322334,
    "BP": "120/80",
    "HR": 76,
    "PMH" : ["diabetes", "hypertension", "atrial fibrillation"]
}

In [3]:
print(med_data)

{'name': 'Mike Smith', 'dob': '13/12/1979', 'age': 40, 'NHS number': 1223322334, 'BP': '120/80', 'HR': 76, 'PMH': ['diabetes', 'hypertension', 'atrial fibrillation']}


Here we can store multiple items of information with associated labels in a single data structure (called a **`key value`** pair). The label is placed in quotes followed by a colon and then the variable. Each item is separated by a comma. We can then access each item (value) using its label (key) in similar way to a list but by using the label name i.e:

In [4]:
print(med_data["BP"])

120/80


In [5]:
print(med_data["name"])

Mike Smith


In [6]:
print(med_data["PMH"])

['diabetes', 'hypertension', 'atrial fibrillation']


In [7]:
print(med_data["PMH"][2])

atrial fibrillation


We can create an empty dictionary by doing either of the following:

In [8]:
my_dict = {}
my_dict = dict()

In [9]:
type(my_dict)

dict

We can also change items stored at this label in the same way as a list. For example changing the blood pressure (BP) value:

In [10]:
med_data["BP"] = "132/76"
print(med_data)

{'name': 'Mike Smith', 'dob': '13/12/1979', 'age': 40, 'NHS number': 1223322334, 'BP': '132/76', 'HR': 76, 'PMH': ['diabetes', 'hypertension', 'atrial fibrillation']}


<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
1. Add the medical condition <code>irritable bowel syndrome (IBS)</code> to the past medical history <code>(PMH)</code> in the dictionary and print the result
</div>

In [11]:
med_data["PMH"].append("IBS")
print(med_data)

{'name': 'Mike Smith', 'dob': '13/12/1979', 'age': 40, 'NHS number': 1223322334, 'BP': '132/76', 'HR': 76, 'PMH': ['diabetes', 'hypertension', 'atrial fibrillation', 'IBS']}


### 3.0 Sets

A set in mathematics is an unordered collection of distinct objects represented by comma separated items called elements enclosed in curly brackets. Elements of a set share some common property (i.e. a set of clothes, prime numbers etc.)

$$A = \{1, 2, 3, 4\} $$

If an element is in a set we would write $2 \in A$. If an element was not in a set it would be $27 \notin A$. Sets can be created either by using the curly brackets in the same way as they would be written in math notation (**`A`**). Or by using the set constructor (**`B`**). 

In [34]:
A = {1, 2, 3, 4}
print(type(A))

<class 'set'>


In [35]:
B = set(("dog", "cat", "pig"))
print(type(B))

<class 'set'>


We can use the keywords **`in`** and **`not in`** in the same was $\in$ and $\notin$.

In [36]:
print(2 in A)
print(27 not in A)

True
True


In [37]:
C = {1, 2, 3, 4, 1, 2}
print(C)
print(type(C))

{1, 2, 3, 4}
<class 'set'>


<div class="alert alert-block alert-info">
<b>Task 4:</b>
<br> 
What happened in the cell above? Why does it only show <code>1, 2, 3, 4</code>?
</div>

Remember sets must contain distinct elements so the duplicates are ignored. 

There are several useful functions for working with sets, including:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-5ua9{font-weight:bold;text-align:left}
.tg .tg-s268{text-align:left}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-5ua9">Function</th>
    <th class="tg-5ua9">Description</th>
    <th class="tg-5ua9">Example</th>
  </tr>
  <tr>
    <td class="tg-5ua9">clear()</td>
    <td class="tg-s268">Removes all set elements</td>
    <td class="tg-s268">my_set.clear()</td>
  </tr>
  <tr>
    <td class="tg-s268">add()</td>
    <td class="tg-s268">Add item to set</td>
    <td class="tg-s268">my_set.add(1)</td>
  </tr>
  <tr>
    <td class="tg-s268">copy()</td>
    <td class="tg-s268">Returns copy of a set</td>
    <td class="tg-s268">B = my_set.copy(A)</td>
  </tr>
  <tr>
    <td class="tg-s268">remove()</td>
    <td class="tg-s268">Remove a specified element from a set</td>
    <td class="tg-s268">my_set.remove("apple")</td>
  </tr>
  <tr>
    <td class="tg-s268">pop()</td>
    <td class="tg-s268">Remove element from set</td>
    <td class="tg-s268">my_set.pop()</td>
  </tr>
  <tr>
    <td class="tg-s268">discard()</td>
    <td class="tg-s268">Remove specified item from set</td>
    <td class="tg-s268">my_set.discard("apple")</td>
  </tr>
  <tr>
    <td class="tg-s268">difference()</td>
    <td class="tg-s268">A set with the difference between sets</td>
    <td class="tg-s268">A = B.difference(C)</td>
  </tr>
  <tr>
    <td class="tg-s268">union()</td>
    <td class="tg-s268">A set with a union of sets</td>
    <td class="tg-s268">A = B.union(C)</td>
  </tr>
  <tr>
    <td class="tg-s268">issubset()</td>
    <td class="tg-s268">Returns T or F if a set is contained within another</td>
    <td class="tg-s268">A = B.issibset(C)</td>
  </tr>
  <tr>
    <td class="tg-s268">intersection()</td>
    <td class="tg-s268">Return set where items exist in both sets</td>
    <td class="tg-s268">A = B.intersection(C)</td>
  </tr>
  <tr>
    <td class="tg-0lax">isdisjoint()</td>
    <td class="tg-0lax">Return T or F if sets have an intersection</td>
    <td class="tg-0lax">A = B.isdisjoint(C) </td>
  </tr>
</table>


### 4.0 Tuples

Similar to sets, Python also supports **`tuples`**. A tuple is an ordered and immutable (unchangeable) finite collection. Items cannot be added or removed from tuples. In math notation tuples are usually placed within round or pointed brackets $(1, 2, 3, 4)$ or $<1, 2, 3 ,4>$. In Python they can be created in 2 ways, with round brackets or with the tuple constructor.  

In [38]:
my_tuple = (1, 2, 3, 4)
print(my_tuple)
print(type(my_tuple))

(1, 2, 3, 4)
<class 'tuple'>


In [39]:
another_tuple = tuple((1, 2, 3, 4, 1))
print(another_tuple)
print(type(another_tuple))

(1, 2, 3, 4, 1)
<class 'tuple'>


<div class="alert alert-success">
<b>Note:</b> Tuples can have repeated elements unlike sets. But why use them? If you are ever using a set of values that don't change (constants) then tuples are faster (more efficient) than lists. They are also safer because they can't be changed. It is impossible to overwrite or add/remove elements by accident.
</div>

#### Notebook details
<br>
<i>Notebook created by <strong>Dr. Alan Davies</strong> with, <strong>Frances Hooley</strong> 
    

Publish date: October 2020<br>
Review date: October 2021</i>

Please give your feedback using the button below:

<a class="typeform-share button" href="https://form.typeform.com/to/YMpwLTNy" data-mode="popup" style="display:inline-block;text-decoration:none;background-color:#3A7685;color:white;cursor:pointer;font-family:Helvetica,Arial,sans-serif;font-size:18px;line-height:45px;text-align:center;margin:0;height:45px;padding:0px 30px;border-radius:22px;max-width:100%;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;font-weight:bold;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;" target="_blank">Rate this notebook </a> <script> (function() { var qs,js,q,s,d=document, gi=d.getElementById, ce=d.createElement, gt=d.getElementsByTagName, id="typef_orm_share", b="https://embed.typeform.com/"; if(!gi.call(d,id)){ js=ce.call(d,"script"); js.id=id; js.src=b+"embed.js"; q=gt.call(d,"script")[0]; q.parentNode.insertBefore(js,q) } })() </script>