In [None]:
!git clone https://github.com/nhsbsa-data-analytics/coffee-and-coding

Cloning into 'coffee-and-coding'...
remote: Enumerating objects: 2176, done.[K
remote: Counting objects: 100% (605/605), done.[K
remote: Compressing objects: 100% (248/248), done.[K
remote: Total 2176 (delta 386), reused 494 (delta 345), pack-reused 1571 (from 1)[K
Receiving objects: 100% (2176/2176), 133.11 MiB | 19.73 MiB/s, done.
Resolving deltas: 100% (987/987), done.
Updating files: 100% (373/373), done.


# Python: Lists and dictionaries - an Introduction

A lot of our Coffee and Coding sessions are focused on using R. It is a good introductory programming language that is used throughout the NHSBSA.

Depending on your skill level you may already be familiar with R **Vectors** and **Lists**

**Vectors** are basic data structures that contain elements of the same type.

**Example R Vectors** <br> c(1, 2, 3, 4) or c('a', 'b', 'c', 'd')

**Lists** are a more complicated, but versatile data structure that can contain multiple different data types.

**Example R List** <br>list(<br>
  John = list(
    Surname = "Smith",<br>
    Address = "123 Main Street",<br>
    Postcode = "AB12CD",<br>
    Mobile = "01910029123"<br>
  )<br>)

  In this Coffee and Coding session we will be exploring Python the equivalents:

| R | Python |
| :------: | :----: |
|   Vector   | List |
|   List   | Dictionary |


## Python Lists: Introduction

A Python list is one of the most basic structures. To create a new list you can either use square brackets `[ ]`, or you can use the `list()` function.

### Creating a list
Below we will create our fist list. It will contain the numbers between 1 and 5. Remember, each element of a list is separated by a comma.

In [None]:
my_first_list = [1, 2, 3, 4, 5]

my_first_list

[1, 2, 3, 4, 5]

### Accessing data in a list

Suppose that we want to access one of the elements in our list.

We can do this by selecting the element based on its index position.

<font color="red">**Warning:** Python is a zero indexed programming language. This means the first element has index 0 and **not** 1. This is different to R which starts its indexing from 1.</font>

To select the first element from a list we can do:


In [None]:
my_first_list[0]

1

To select the first two elements we can do:

In [None]:
my_first_list[0:2]

[1, 2]

To select the odd numbers in the list, or more specifically the numbers in odd index positions we can do:

In [None]:
my_first_list[0::2]

[1, 3, 5]

And likewise for the even numbers:


In [None]:
my_first_list[1::2]

[2, 4]

## Python Lists: Common things to do with Lists

Now that we have introduced Python lists lets take a look at some common functionality.

We will now create a new list, this time using letters rather than numbers.

In [None]:
my_second_list = ['a', 'b', 'c', 'd', 'e','a','b']

my_second_list

['a', 'b', 'c', 'd', 'e', 'a', 'b']

### Calculating the length of a list
Something you may want to do is count how many total elements are within a list.

This can easily be done using the `len()` function. Which calculates the length of the list.

In [None]:
len(my_second_list)

7

### Number of unique elements in a list

But our list contains duplicates. How can we count the number of unique values within the list?

We can use `len()` in combination with the `set()` function.

A set is similar to a list with one key difference. A list can contain duplicate values, a set can only contain unique values.


In [None]:
len(set(my_second_list))

5

### Getting the count of a value
Suppose that you want to find out how often a value appears within the list. This is simple with the `.count()` function.

In [None]:
my_second_list.count('a')

2

### Finding the index of a value
Likewise we can find the position (index) of a particular value using `index`.

**This only identifies the first occurrence of the value**

Later on we will explore a different way to find all occurrences of a value.

In [None]:
my_second_list

['a', 'b', 'c', 'd', 'e', 'a', 'b']

In [None]:
my_second_list.index('d')


3

In [None]:
#Only the first occurrence of 'a' is returned.
my_second_list.index('a')

0

### Adding data to an existing list
Once we have our list we can also add to it using the `.append()` functionality

In [None]:
my_second_list.append('f')

In [None]:
my_second_list

['a', 'b', 'c', 'd', 'e', 'a', 'b', 'f']

And if we want to insert in a specific location rather at the end we can use `.insert()`

In [None]:
my_second_list.insert(3,'z')
my_second_list

['a', 'b', 'c', 'z', 'd', 'e', 'a', 'b', 'f']

### Changing the order of a list
We can also reverse the order that elements appear.


In [None]:
my_second_list = ['a', 'b', 'c', 'd', 'e','a','b']

my_second_list.reverse()
my_second_list

['b', 'a', 'e', 'd', 'c', 'b', 'a']

### Sorting

Python lists are unordered structures, meaning that that can be inserted in any order you choose.

However, it may be useful to sort the list for a particular task. Again there is the built in `.sort()` functionality to help with this.

In [None]:
my_second_list.sort()
my_second_list

['a', 'a', 'b', 'b', 'c', 'd', 'e', 'f']

### Removing elements

And finally. You may be interested in removing specific element from the list. This can be done using `.pop()`.

In [None]:
my_second_list

['b', 'a', 'e', 'd', 'c', 'b', 'a']

In [None]:
my_second_list.pop()
my_second_list

['b', 'a', 'e', 'd', 'c', 'b']

or we can choose a specific element to remove

In [None]:
my_second_list.pop(2)
my_second_list

['b', 'a', 'd', 'c', 'b']

## Looping through lists: List Comprehensions

Often it is useful to loop through the elements of a list.

For example, if we want to apply a mathematical operator to each numeric element.

Note there are simpler ways of doing this using packages, but during this session we will be using lists.

In [None]:
my_third_list = [1,2,3,4,5,6,7,8,9]

my_third_list



[1, 2, 3, 4, 5, 6, 7, 8, 9]

Suppose we want to multiply each element in our list by a factor of 2.

We could do this using a typical for loop.

In [None]:
idx=0
for value in my_third_list:

  my_third_list[idx]=value*2
  idx=idx+1

my_third_list



[2, 4, 6, 8, 10, 12, 14, 16, 18]

### Enumerate function
We could simplify things a little using the `enumerate()` function.

In the loop the `enumerate()` function returns the index and value.

In [None]:
my_third_list = [1,2,3,4,5,6,7,8,9]
for idx,value in enumerate(my_third_list):

  my_third_list[idx]=value*2

my_third_list

[2, 4, 6, 8, 10, 12, 14, 16, 18]

A more pythonic way of looping through the list is to use a comprehension.

A list comprehension follows a basic structure: [*expression* <font color="blue">for</font> *item* <font color="blue"> in</font> *iterable*]

In [None]:
[value*2 for value in my_third_list]

[4, 8, 12, 16, 20, 24, 28, 32, 36]

This is a more elegant solution to writing a full for loop.


We can also add conditions to the comprehension: [*expression* <font color="blue">for</font> *item* <font color="blue"> in</font> *iterable* <font color='blue'> if </font> *condition* == True]

Going back to the example where we want to find where in the list all of the `a` occur. We can do this with a list comprehension and enumerate function.



In [4]:
my_second_list = ['a', 'b', 'c', 'd', 'e','a','b']

[i for i, value in enumerate(my_second_list) if value == 'a']

[0, 5]

In [None]:
[value*2 for idx,value in enumerate(my_third_list) if idx % 2==0]

[4, 12, 20, 28, 36]

We can also add in an else statement. Notice this changes the comprehension structure slightly.


In [None]:
[value*2 if idx % 2==0 else value  for idx,value in enumerate(my_third_list) ]

[4, 4, 12, 8, 20, 12, 28, 16, 36]

### Zip function

We can also use a list comprehension on multiple lists at the same time.

This could be useful if you want to join two lists together or if you wanted to apply mathematical operators to two numeric lists.

The function we will make use of is called `zip()`. When looping over a `zip()` the ith element of each list will be made available.

In [None]:
# Two separate lists
list1 = [1, 2, 3]
list2 = ['a', 'b', 'c']

# Using list comprehension to create a list of tuples
tuple_list = [(x, y) for x, y in zip(list1, list2)]

### When to, and when not to use list comprehensions
List comprehensions are a great way of writing short and understandable code.

But they are not always the best option.

- When the task to complete is simple they are usually the best approach to take

However, when the task to complete is more complicated using list comprehensions can be difficult to understand.

Take the below as an exmaple:

In [None]:
list3=[[1,2,3],[4,5,6],[7,8,9]]

In [None]:
complicated_nested_lc=[[j for j in i if j%2 ==0] for i in list3]



[[2], [4, 6], [8]]

It's not overly obvious what is going on here.

It is a little bit clearer when there are better labels but it is still tricky to understand.


In [None]:
better_labels_complicated_nested_lc=[[subelement for subelement in element if subelement%2 ==0] for element in list3]

Instead we could use a for loop and list comprehension to try and make things a little clearer

In [None]:
list4=[]
#loop through every sublist
for sublist in list3:
  #only return the even numbers from the sublist
  list4.append([element for element in sublist if element%2 ==0])



In [None]:
list4

[[2], [4, 6], [8]]

# Dictionaries

## Our first dictionary

Lets create our first dictionary. Remember, a dictionary is a collection of keys and values.

In our example we will use countries as the keys and their capital cities as the values.

In [None]:
captial_cities={'United Kingdom':'London',
                'France':'Paris',
                'Germany':'Berlin'}

In [None]:
captial_cities.keys()

dict_keys(['United Kingdom', 'France', 'Germany'])

In [None]:
captial_cities.values()

dict_values(['London', 'Paris', 'Berlin'])

### Accessing values

Suppose we want to access the value for a particual country. We can do it using a similar notation to lists.



In [None]:
captial_cities['United Kingdom']

'London'

In [None]:
captial_cities['Italy']='Rome'

## A more complicated example

Dictionaries are not limited to a single value. They can have multiple layers depending on the situation.

Suppose we wanted to create a dictionary which mimics a simple phonebook. We want our phonebook to capture the persons:

1. Name
2. Surname
3. Address
4. Postcode
5. Mobile Number

We can do this using a dictionary by using the name as a key and then assign the other attributes as key value pairs within a secondary dictionary.

An example of what this looks like can be seen below

In [None]:
phonebook={'John':{'Surname':'Smith',
                   'Address': '123 Main Street',
                   'Postcode':'AB12CD',
                   'Mobile':'01910029123'},
           'Jane':{'Surname':'Smith',
                   'Address': '123 Main Street',
                   'Postcode':'AB12CD',
                   'Mobile':'01234990243'},
           'Harry':{'Surname':'Potter',
                   'Address': '4 Privet Drive',
                   'Postcode':'NE459RD',
                   'Mobile':'07259201953'}}

### Filtering on keys

Again, to access and individual we can use the `[ ]` notation.

In [None]:
phonebook['Harry']

{'Surname': 'Potter',
 'Address': '4 Privet Drive',
 'Postcode': 'NE459RD',
 'Mobile': '07259201953'}

To access specific attributes for a person we add in a second `[ ]`

In [None]:
phonebook['Harry']['Surname']

'Potter'

Sometimes the key may not exist. Our approach of selecting values will create an error if we try and use an unknown key.

In [None]:
#This will create an error
phonebook['James']

KeyError: 'James'

Sometimes we don't want an error if a key is missing. Instead we could use the `get()` functionality

In [None]:
phonebook.get('James')

### Accessing all values for a specific attribute: List comprehensions to the rescue

If we wanted to get all of the surnames in our dictionary we could use list comprehensions

In [None]:
[phonebook[key]['Surname'] for key in phonebook.keys()]

['Smith', 'Smith', 'Potter']

### Removing keys

In [None]:
phonebook.pop('John')

{'Surname': 'Smith',
 'Address': '123 Main Street',
 'Postcode': 'AB12CD',
 'Mobile': '01910029123'}

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Potter',
  'Address': '4 Privet Drive',
  'Postcode': 'NE459RD',
  'Mobile': '07259201953'}}

### Updating information

We can also easily update the values of certain keys and attributes using the `[ ]` notation.

In [None]:
phonebook['Harry']['Address']='Hogwarts'

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Potter',
  'Address': 'Hogwarts',
  'Postcode': 'NE459RD',
  'Mobile': '07259201953'}}

We can add new keys...

In [None]:
phonebook['Kier']={'Surname':'Starmer',
                   'Address': '10 Downing Street',
                   'Postcode':'SW1A 2AB',
                   'Mobile':'011111111111'}

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Potter',
  'Address': 'Hogwarts',
  'Postcode': 'NE459RD',
  'Mobile': '07259201953'},
 'Kier': {'Surname': 'Starmer',
  'Address': '10 Downing Street',
  'Postcode': 'SW1A 2AB',
  'Mobile': '011111111111'}}

And we can include special attributes for certain keys.

**Keys do not need to share the same attributes.**

In [None]:
phonebook['Kier']['Occupation']='Prime Minister'

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Potter',
  'Address': 'Hogwarts',
  'Postcode': 'NE459RD',
  'Mobile': '07259201953'},
 'Kier': {'Surname': 'Starmer',
  'Address': '10 Downing Street',
  'Postcode': 'SW1A 2AB',
  'Mobile': '011111111111',
  'Occupation': 'Prime Minister'}}

And the attributes dont have to be strings. They can pretty much be any Python object.

For example we could add a list

In [None]:
phonebook['Harry']['Friends']=['Ron','Hermione']

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Potter',
  'Address': 'Hogwarts',
  'Postcode': 'NE459RD',
  'Mobile': '07259201953',
  'Friends': ['Ron', 'Hermione']},
 'Kier': {'Surname': 'Starmer',
  'Address': '10 Downing Street',
  'Postcode': 'SW1A 2AB',
  'Mobile': '011111111111',
  'Occupation': 'Prime Minister'}}

### Keys have to be unique

It is common for people to know two or more people with the same name.

What happens if we want to add a new Harry to the phonebook?

In [None]:
phonebook['Harry']={'Surname':'Smith',
                   'Address': '1 Monoploy Road',
                   'Postcode':'ACD201D',
                   'Mobile':'098200534631'
                   }

In [None]:
phonebook

{'Jane': {'Surname': 'Smith',
  'Address': '123 Main Street',
  'Postcode': 'AB12CD',
  'Mobile': '01234990243'},
 'Harry': {'Surname': 'Smith',
  'Address': '1 Monoploy Road',
  'Postcode': 'ACD201D',
  'Mobile': '098200534631'},
 'Kier': {'Surname': 'Starmer',
  'Address': '10 Downing Street',
  'Postcode': 'SW1A 2AB',
  'Mobile': '011111111111',
  'Occupation': 'Prime Minister'}}

The new Harry's details have been added to the dictionary but have overwritten Harry Potters details.

This is because dictionary keys have to be **unique**.

This is a problem for us. We can't have an incomplete phonebook.

What and how could we do to fix this problem?

One way of solving this is to create a unique contact ID for each entry.

We could use a combination of name and surnmae but there is still a chance that this will not be unique.

In [None]:
phonebook={'Contact1':{'Name':'John',
                   'Surname':'Smith',
                   'Address': '123 Main Street',
                   'Postcode':'AB12CD',
                   'Mobile':'01910029123'},
           'Contact2':{'Name':'Jane',
                   'Surname':'Smith',
                   'Address': '123 Main Street',
                   'Postcode':'AB12CD',
                   'Mobile':'01234990243'},
           'Contact3':{'Name':'Harry',
                    'Surname':'Potter',
                   'Address': '4 Privet Drive',
                   'Postcode':'NE459RD',
                   'Mobile':'07259201953'},
           'Contact4':{'Name':'Harry',
                    'Surname':'Smith',
                   'Address': '1 Monoploy Road',
                   'Postcode':'ACD201D',
                   'Mobile':'098200534631'}}

### Dictionary comprehension: Dynamically create dictionaries using lists

Often times dictionaries can be good objects to store the results. It could be to store different versions of output or even output from different machine learning models.

We don't want to manually create our dictionary by hand. So we can make use of dictionary comprehensions and the `zip()` function we learnt about earlier.

Suppose all of our output is stored in lists. We can covert this to a dictionary object by using the following code:



In [None]:
key=['Contact1','Contact2','Contact3']
names=['John','Jane','Harry']
surnames=['Smith','Smith','Potter']
addresses=['123 Main Street','123 Main Street','4 Privet Drive']
postcodes=['AB12CD','AB12CD','NE459RD']



{key:{'name':name,
      'surname':surname,
      'address':address,
      'postcode':postcode
      }
      for key,name,surname,address,postcode in zip(key,names,surnames,addresses,postcodes)
}

{'Contact1': {'name': 'John',
  'surname': 'Smith',
  'address': '123 Main Street',
  'postcode': 'AB12CD'},
 'Contact2': {'name': 'Jane',
  'surname': 'Smith',
  'address': '123 Main Street',
  'postcode': 'AB12CD'},
 'Contact3': {'name': 'Harry',
  'surname': 'Potter',
  'address': '4 Privet Drive',
  'postcode': 'NE459RD'}}

# Conculsion

This has been an introduction into Python Lists and Dictionaries. It is not a comprehensive introduction as there are many different and interesting ways to use these objects.

Hopefully this session has given you a good foundational knowledge of Lists and Dictionaries.

This will also hopefully be a useful resource if anybody is interested in learning Python