<a href="https://colab.research.google.com/github/magdagucman/programming-for-data/blob/main/worksheets/Lists_and_Tuples_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lists and tuples

Often we need to store a number of single items of data together so that they can be processed together. This might be because all the data refers to one person (e.g. name, age, gender, etc) OR it might be because we have a set of data (e.g. all the items that should be displayed in a drop down list, such as all the years from this year back to 100 years ago so that someone can select their year of birth)

Python has a range of data structures available including:
*   lists  
*   tuples  
*   dictionaries  
*   sets

This worksheet looks at lists and tuples.

## List
A list is a set of related, individual data objects, that are indexed and can be processed as a whole, as subsets or as individual items.  Lists are stored, essentially, as contiguous items in memory so that access can be as quick as possible.  However, they are mutable (they can be changed after they are created and stored) and so those mechanisms need to include extra functionality to deal with changing list sizes.

## Tuple
Essentially the same as a list but it is immutable.  Once it has been created it can't be changed.  It is stored in memory as contiguous items, with the size required being fixed right from the start.  This makes it faster to access.

The code below will create two lists and a tuple.
*   the first list contains 1000 random numbers between 1 and 100
*   the second list is of random length (up to 5000) and each item is one of the 9 characteristics that are protected under the Equality Act in the UK.
*   the tuple contains the 9 protected characteristics

**Before you start the exercises, run the code below.**  It will generate the lists and tuple so that you can use them in the exercises.  If you need to recreate the lists again (because you have changed them and need to work on the originals, just run this cell again).

***Note:***  *a list variable contains a reference to the start of the list in memory, rather than storing the list itself.  This means that if you assign the list to another variable (to make a copy), it will only copy across the reference.  If you change the copy, you change the original list.*

*If you need to make a copy of the list you will need to use a loop to create a new list and copy all items across.*

In [178]:
### RUN THIS CODE EVERYTIME YOU NEED LISTS TO WORK WITH IN THE EXERCISES BELOW ###
from random import randint, choice

def get_num_list():
  num_list = [randint(1,100) for n in range(1000)]
  return num_list

def get_protected_characteristics():
  characteristics_tuple = ('age','disability','gender reassignment','marriage and civil partnership','pregnancy and maternity','race','religion or belief','sex','sexual orientation')
  return characteristics_tuple

def get_protected_characteristic_list(protected_characteristics):
  char_list = [choice(protected_characteristics) for ch in range(randint(1,5000))]
  return char_list

nums = get_num_list()
protected_characteristics = get_protected_characteristics()
characteristics = get_protected_characteristic_list(protected_characteristics)

## The exercises below will use the lists:  
*   **nums** (a list of between 1 and 1000 random numbers, each number is between 0 and 1000)
*   **characteristics** (a list of 5000 random protected_characteristics)

and the tuple:
*  **protected_characteristics** (a set of the 9 protected characteristics identified in the Equality Act)

## You can run the cell above any number of times to generate new lists.

---
### Exercise 1 - list head, tail and shape

Write a function, **describe_list()** which will:
*  print the length of the list `nums`
*  print the first 10 items in `nums`  
*  print the last 5 items in `nums`  

*Expected output*:  
The length of nums is *552* (note: 552 is an example - the number will be between 1 and 1000)    
First 10 items: [a list of 10 characteristics]  
Last 5 items: [a list of 5 characteristics]


In [179]:
def describe_list():
  print("The length of nums is", len(nums))
  print("First 10 items", nums[:10])
  print("Last 5 items", nums[(-5):])

describe_list()
    
  

The length of nums is 1000
First 10 items [76, 25, 97, 83, 80, 1, 6, 77, 65, 9]
Last 5 items [98, 12, 13, 58, 63]


---
### Exercise 2 - show tuple items

Write a function which will:
*   use a loop to print the list of protected characteristics from the `protected_characteristics` tuple.  

*Expected output*:  
age  
disability  
gender reassignment  
marriage and civil partnership  
pregnancy and maternity  
race  
religion or belief  
sex  
sexual orientation  


In [180]:
def print_protected_characteristics():
  for item in protected_characteristics:
    print(item)

print_protected_characteristics()

age
disability
gender reassignment
marriage and civil partnership
pregnancy and maternity
race
religion or belief
sex
sexual orientation


---
### Exercise 3 - display items in the middle

Write a function which will:
*  calculate the position of the middle item in the `characteristics` list   
(*Hint: use len() to help with this*)
*  calculate the position of the item that is 5 places before the middle item
*  calculate the position of the item that is 5 places after the middle item
*  print the part of the list that includes the items from 5 places before to 5 places after.  

*Expected output*:  
Your list will include 10 items.  

In [182]:
def print_middle():

  # If the list is of odd length
  if len(characteristics) % 2 != 0:
    middle_item_position = int((len(characteristics)-1)/2)
    five_before = middle_item_position - 5
    five_after = middle_item_position + 5
    for index in range(five_before, five_after + 1):
      print(characteristics[index])
  
  # If the list is of even length
  elif len(characteristics) % 2 == 0:
    middle_item_1 = int(len(characteristics)/2 - 1)
    middle_item_2 = int(len(characteristics)/2)
    five_before = middle_item_1 - 5
    five_after = middle_item_2 + 5
    for index in range(five_before, five_after + 1):
      print(characteristics[index])

# Call the function 
# In the case of odd length of list, I thought there should be 11 items printed (middle + 5 before + 5 after) 
# My logic assumed that in the case of even length, there are 2 items in the middle (as opposed to no items), so the total number of items would be 12 (2 from the middle + 5 before + 5 after).
print_middle()

sexual orientation
gender reassignment
disability
race
gender reassignment
race
disability
religion or belief
race
sexual orientation
marriage and civil partnership


---
### Exercise 4 - create a copy

Write a function which will: use a for loop to create a copy of the `nums` list:

*   create a new, empty, list called **new_nums**  (*Hint: an empty list is [ ]*)
*   use a for loop which uses the following syntax:  `for num in nums:`
*   each time round the loop append `num` to `new_nums`  ( *`new_nums.append(num)`*)
*   print the first 10 items of `new_nums`
*   print the first 10 items of `nums`
*   print the length of both lists

*Example expected output*:  
[34, 3, 16, 3, 79, 33, 98, 24, 48, 8]  
[34, 3, 16, 3, 79, 33, 98, 24, 48, 8]  
Length of new_nums: 1000 Length of nums: 1000  

In [186]:
new_nums = []

def create_copy():
  for num in nums:
    new_nums.append(num)

create_copy()

print(nums[:10])
print(new_nums[:10])
print("Length of nums:", len(nums))
print("Length of new_nums:", len(new_nums))

[76, 25, 97, 83, 80, 1, 6, 77, 65, 9]
[76, 25, 97, 83, 80, 1, 6, 77, 65, 9]
Length of nums: 1000
Length of new_nums: 1000


### Exercise 5 - count the occurrence of age in characteristics

---

Write a function which will use the list method:

`list_name.count(item)`

to count the number of occurrences of 'age' in the `characteristics` list.

Print the result.

*Example expected output*:  
Number of times age appears in list: 42

In [187]:
def age_counter():
  age_count = characteristics.count("age")
  return age_count

print("Number of times age appears in list:", age_counter())

# Alternatively, without needing to define a whole new function
print("Number of times age appears in list:", characteristics.count("age"))

Number of times age appears in list: 185
Number of times age appears in list: 185


---
### Exercise 6 - sort the nums list

Write a function which will:
*   call the function `get_num_list()` and store the result in a new list called **sort_nums**
*   print the first, and last, 10 items in the `sort_nums` list
*   use the `list_name.sort()` method to sort the `sort_nums` list into ascending order
*   print the first, and last, 10 items again  
*   use the `list_name.sort()` method again to sort the `sort_nums` list into descending order
*   print the first, and last, 10 items again  

*Example expected output*:  
Unsorted [94, 95, 54, 75, 87, 32, 55, 15, 37, 30] ....... [91, 92, 40, 83, 37, 82, 78, 82, 80, 9]  
Ascending [1, 1, 1, 1, 1, 1, 1, 2, 2, 2] ....... [99, 99, 99, 100, 100, 100, 100, 100, 100, 100]  
Descending [100, 100, 100, 100, 100, 100, 100, 99, 99, 99] ....... [2, 2, 2, 1, 1, 1, 1, 1, 1, 1]  

In [188]:
def sort_nums_list():
  sort_nums = get_num_list()
  print("Unsorted:", sort_nums[:10], end="...")
  print(sort_nums[(-10):])
  sort_nums.sort()
  print("Ascending:", sort_nums[:10], end="...")
  print(sort_nums[(-10):])
  sort_nums.sort(reverse=True)
  print("Descending: ", sort_nums[:10], end="...")
  print(sort_nums[(-10):])

sort_nums_list()

Unsorted: [43, 75, 37, 71, 71, 37, 22, 61, 21, 69]...[17, 11, 34, 50, 28, 46, 79, 27, 34, 58]
Ascending: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]...[99, 99, 100, 100, 100, 100, 100, 100, 100, 100]
Descending:  [100, 100, 100, 100, 100, 100, 100, 100, 99, 99]...[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


---
### Exercise 7 - get statistics (max(), min(), sum() )

Write a function which will:
*   print the maximum and minimum numbers in the `nums` list  
*   print the sum of the `nums` list
*   calculate and print the average of the `nums` list (using `len()` to help)  

*Example expected output*:  
Maximum: 100 Minimum: 1  
Sum 49750  
Average 49.75   


In [189]:
def get_statistics(): 
  print("Maximum: ", max(nums))
  print("Minimum: ", min(nums))
  print("Sum: ", sum(nums))
  print("Average: ", sum(nums)/len(nums))

get_statistics()

Maximum:  100
Minimum:  1
Sum:  50616
Average:  50.616


---
### Exercise 8 - percentage difference

Write a function which will:
*   generate a new list called **ex8_nums** using `get_num_list()`
*   calculate and print the percentage difference between the first number in each list (as a percentage of the number in the nums list) (Hint:  find the difference between the two numbers, divide the difference by the number in `nums` and multiply by 100 - and use abs to ensure that the number is always positive)
*   calculate and print the percentage difference between the last numbers in each list in the same way
*   calculate and print the percentage difference between the middle numbers in each list in the same way.
*   calculate and print the percentage difference between the sums of each list in the same way

*Example expected output*:  
First number difference 0.59 %  
Last number difference 0.97 %  
Middle number difference 0.65 %  
Sum number difference 0.01 %  

In [198]:
def percentage_difference():
  ex8_nums = get_num_list()
  print("First number difference", round(abs((ex8_nums[0]-nums[0])/nums[0])*100, 2), "%")
  print("Last number difference", round(abs(100*(ex8_nums[-1]-nums[-1])/nums[-1]), 2), "%")

  # Below is a bit inaccurate as both lists have even number of items (always 1000), so each has 2 items in the middle
  print("Middle number difference", round(abs(100*(ex8_nums[int(len(ex8_nums)/2)]-nums[int(len(nums)/2)])/nums[int(len(nums)/2)]), 2), "%")
  print("Sum number difference", round(abs(100*(sum(ex8_nums)-sum(nums))/sum(nums)), 2), "%")

percentage_difference()

First number difference 84.21 %
Last number difference 76.19 %
Middle number difference 88.24 %
Sum number difference 2.75 %


---
### Exercise 9 - characteristic counts

Write a function which will:
*  iterate through the `protected_characteristics` tuple and for each **characteristic**:
*   *   count the number of occurrences of that `characteristic` in the `characteristics` list
*   *   print the `protected_characteristic` and the **count**    
  
*Example expected output*:

age 100  
disability 120  
gender reassignment 120  
marriage and civil partnership 111  
pregnancy and maternity 103  
race 106  
religion or belief 95  
sex 110  
sexual orientation 113  

Extra learning:  you can read [here](https://thispointer.com/python-how-to-pad-strings-with-zero-space-or-some-other-character/) how to justify the printed characteristic so that the output is organised into two columns as shown below:  
![tabulated output](https://drive.google.com/uc?id=1CCXfX6K5ZeDefnq7vUsqxCDmqvcfY8Mz)





In [234]:
def characteristic_counts():
  column1 = "Protected Characteristic"
  column2 = "Frequency"
  
  if len(column1) > len(max(characteristics, key=len)):
    longest_name = len(column1)
  else: 
    longest_name = len(max(characteristics, key=len))
  
  if len(column2) > len(most_common(characteristics)):
    longest_freq = len(column2)
  else: 
    longest_freq = len(most_common(characteristics))

  print(column1, column2.rjust(longest_name + longest_freq - len(column1), ' '))
  for characteristic in protected_characteristics:
    count_characteristic = characteristics.count(characteristic)
    print(characteristic, str(count_characteristic).rjust(longest_name + longest_freq - len(characteristic), ' '))
  return


characteristic_counts()

Protected Characteristic       Frequency
age                                  185
disability                           225
gender reassignment                  239
marriage and civil partnership       217
pregnancy and maternity              237
race                                 246
religion or belief                   228
sex                                  195
sexual orientation                   215


---
### Exercise 10 - characteristics statistics

Assuming that the `characteristics` list may have been taken from a study of cases that have been taken to court in relation to the Equality Act.  

Write a function which will:

*   find the most common characteristic resulting in court action, from this population
*   print this in a message, e.g. The characteristic with the highest number of court cases is:  *characteristic*
*   print the list of `protected_characteristics`, on one line if possible - see [here](https://www.geeksforgeeks.org/g-fact-25-print-single-multiple-variable-python/)
*   ask the user to enter a characteristic that they would like to see statistics on and use a while loop to continue until the user has entered a valid characteristic
*   print the characteristic, its frequency and the percentage that this frequency is of the whole population.

*Test input*:  
maternity and pregnancy

*Example expected output*:  
The characteristic with the highest number of court cases is: sex  
Protected characteristics:  age disability gender reassignment marriage and civil partnership pregnancy and maternity race religion or belief sex sexual orientation   
Which characteristic would you like to see statistics on? marriage  
Which characteristic would you like to see statistics on? age  
age 473 10.97 %  


In [233]:
def most_common(list):
  return max(set(list), key = list.count)

print("The characteristic with the highest number of court cases is:", most_common(characteristics))

print("Protected characteristics:", end = " ") 
print(*protected_characteristics, sep =", ")
print("Which characteristic would you like to see statistics on?", end = " ")
answer = input()
while answer not in characteristics:
  print("Which characteristic would you like to see statistics on?", end = " ")
  answer = input()
print (answer, characteristics.count(answer), round(100*characteristics.count(answer)/len(characteristics),2), "%")

The characteristic with the highest number of court cases is: race
Protected characteristics: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, sexual orientation
Which characteristic would you like to see statistics on? sex
sex 195 9.81 %


# Reflection
----

## What skills have you demonstrated in completing this notebook?

I have demonstrated basic knowledge of Python syntax, ability to use loops, if statements, defining and calling functions, using methods, casting.

## What caused you the most difficulty?

The exercises seemed straightforward once I refreshed my memory when it comes to syntax, although I am unsure whether my approach was always correct. My general pace was quite slow as I hadn't used Python since I first learned the basics of it roughly a year ago.

I also got confused as the description of what is in the provided code doesn't seem to match it:

"nums (a list of between 1 and 1000 random numbers, each number is between 0 and 1000)" -- the range in the code is set to 1000 (not randomized), and integers seem to be chosen randomly from 1 to 100

"characteristics (a list of 5000 random protected_characteristics)" -- in the provided code the length of characteristics is randomized to fall between 1 and 5000