# Introduction to Python

## Day 2 - solutions

<h1 style="color: #fcd805">Exercise: Functions</h1>

1. Define a function that calculates and returns the area of a circle based on its radius ($\pi r ^{2}$).

In [None]:
def area(radius):
    PI = 3.1415 # could also use the `math` module which has a built-in pi
    return PI * radius**2

area(14)

615.734

2. Define a function that calculates and returns **both** the area *and* the circumference ($2 \pi r$) of a circle based on its radius.

In [None]:
def circle_measurements(radius):
    PI = 3.1415

    # returning as a dictionary, but could be anything
    measurements = {
        "area": PI * radius**2,
        "circumference": 2 * PI * radius
    }

    return measurements

circle_measurements(14)

{'area': 615.734, 'circumference': 87.962}

3. Define a function to calculate and return the **future value of a used car** based on its current value, a percentage depreciation value, and the number of years.

The formula to use is $future\_value = current\_value \times (1 - depreciation\_rate)^{years}$

e.g. a car worth 100,000 with 10% depreciation is worth 90,000 after 1 year, 81,000 in year 2, and 72,900 in year 3

In [None]:
def future_value(current_value, pct_depr, n_years):
    return current_value * ((1-pct_depr)**n_years)

future_value(100_000, 0.1, 3)

72900.00000000001

4. Define a function to calculate this same future value, but return **all intermediate values as a list**.

For the previous example, a car worth 100,000 with 10% depreciation over 3 years should return `[90000, 81000, 72900]`

In [None]:
def future_values(current_value, pct_depr, n_years):
    values = []
    # go from year 1 up to (and including) n_years
    for year in range(1, n_years+1):
        year_value = current_value * ((1-pct_depr)**year)
        values.append(year_value)
    return values

future_values(100_000, 0.1, 3)

[90000.0, 81000.0, 72900.00000000001]

<h1 style="color: #fcd805">Exercise: Function arguments</h1>

1. Write a function to check someone in at the airport. The inputs to the function should be the person's name, their date of birth, and then additional keyword arguments where they can specify document numbers (e.g. passport or national ID). Use `**kwargs` for this.

The function should print a message saying the user has successfully checked in **only** if they provided a passport number OR a national ID number. Check the contents of `kwargs` for these inside the function.

The function usage should be something like this:

```python
check_in(name="David", dob="1970-01-01", passport_number="123")
```

and an unsuccessful check in looks like this (no document IDs provided):

```python
check_in(name="David", dob="1970-01-01")
```

In [None]:
def check_in(name, dob, **kwargs):
    if name != "" and dob != "" and (("passport_number" in kwargs) or ("national_id_number" in kwargs)):
        print("Checked in fine!")
    else:
        print("Some details are missing...")

check_in(name="David", dob="1970-01-01", passport_number="123")
check_in(name="David", dob="1970-01-01")

Checked in fine!
Some details are missing...


2. Write a function to validate that all *positional* arguments passed in are numeric. Use `*args` for this and check the values inside `args` one at a time. The function should print a success message if all the arguments passed in are numeric.

To keep things simple, we'll assume anything that's an `int` or a `float` is numeric, otherwise it's not.

In Python, to check if an object is a certain type, use `isinstance`:

```python
isinstance(1, int) # this is True
isinstance("1", float) # this is False
```

In [None]:
def validate_numeric(*args):
    for value in args:
        if isinstance(value, int) == False and isinstance(value, float) == False:
            print(f"Encountered invalid value {value}. Exiting...")
            return
    # we only get here if no issues encountered
    print("Values all fine")

validate_numeric(1, 2, 3, 4, 5)

Values all fine


In [None]:
validate_numeric(1, 2, 3, "4", 5)

Encountered invalid value 4. Exiting...


<h1 style="color: #fcd805">Exercise: Secret santa</h1>

Let's use everything we've learned so far to create a Secret Santa tool.

Secret Santa is where people put their names in a hat, then everyone draws a name. The name they draw is the person they secretly buy a present for.

#### Rules

- everyone only has one secret santa
- a person cannot be their own secret santa
- the number of participants must be greater than 2

#### Output

The tool should be a single function which:

- takes in a list of names
- outputs the secret santa pairings while observing the above rules
- there should be one item in the output for each name supplied, because everyone is assigned someone to buy a present for

Outputs can be anything: tuples, a list of lists, a dictionary, or even just printed messages.

For example, a list like `["David", "Jeff", "Alice", "Martha"]` could produce a list like:

`[("David", "Jeff"), ("Jeff", "Martha"), ("Alice", "David"), ("Martha", "Alice")]`

#### Optional bonuses

- allow for any number of participants
- error handling (print error messages if any of the above rules are broken)

*Hint: as always, break down the problem into small pieces. Start with the function definition, then write out the logic in plain English using comments to clarify the process in your mind, then fill in the blanks with Python code*

In [None]:
import random

def secret_santa(names):
    if len(names) < 3:
        print("You need at least 3 names for secret santa!")
        return

    # let's start with randomly shuffling the list, looping through each user,
    # assigning them their partner to buy a gift for, ensuring we select everyone only once

    # make a list of pairs to remove from whenever we assign a pair
    # and this way we don't affect the original list
    pairs = names.copy()

    random.shuffle(names)

    pairings = []

    for name in names:
        # in case the first name in the pairs list is this person
        if pairs[0] == name:
            pair = pairs[1]
        else:
            pair = pairs[0]

        pairings.append((name, pair))

        # now remove that person from being a pair again
        pairs.remove(pair)

    return pairings

secret_santa(["David", "Alice", "Jeff", "Martha"])

[('Alice', 'David'),
 ('David', 'Alice'),
 ('Jeff', 'Martha'),
 ('Martha', 'Jeff')]

<h1 style="color: #fcd805">Exercise: The Standard Library</h1>

Time to practise using some of the standard library!

1. "Unix time" is a measure of how much time has elapsed since the 1st of January, 1970. Use the `datetime` library to calculate how many days have elapsed in Unix time.

In [None]:
import datetime

elapsed = datetime.datetime.now() - datetime.datetime(1970, 1, 1)
print(type(elapsed))

elapsed.days

<class 'datetime.timedelta'>


19757

2. There is a file in the `data` folder called `songs.json`, but how many songs does it contain?

- Read its contents into a single string
- Use the `json` module to convert the string to a Python object
- Count the songs!

In [2]:
import json

song_string = ""

with open("data/songs.json", "r") as f:
    song_string = f.read()
song_json = json.loads(song_string)
print(len(song_json))

FileNotFoundError: [Errno 2] No such file or directory: 'data/songs.json'

3. Investigate the `pprint` module to print a nice representation of the songs in the `songs.json` file.

Experiment with the different options and compare how your result looks vs. simply printing the songs.

In [None]:
import pprint

pprint.pprint(song_json, indent=2, width=50)

[ { 'artistName': 'The Boomtown Rats',
    'endTime': '2021-11-22 12:05',
    'msPlayed': 259093,
    'trackName': "I Don't Like Mondays"},
  { 'artistName': 'The Darkness',
    'endTime': '2021-11-22 12:09',
    'msPlayed': 217653,
    'trackName': 'I Believe in a Thing Called '
                 'Love'},
  { 'artistName': 'Meat Loaf',
    'endTime': '2021-11-22 12:22',
    'msPlayed': 508333,
    'trackName': 'Paradise By the Dashboard '
                 'Light'},
  { 'artistName': 'Meat Loaf',
    'endTime': '2021-11-22 12:38',
    'msPlayed': 590000,
    'trackName': 'Bat Out of Hell'},
  { 'artistName': 'Sam Cooke',
    'endTime': '2021-11-27 17:01',
    'msPlayed': 157622,
    'trackName': 'Chain Gang'},
  { 'artistName': 'Steve Harley',
    'endTime': '2021-11-27 17:05',
    'msPlayed': 226776,
    'trackName': 'Make Me Smile (Come up and See '
                 'Me) - 2014 Remaster'},
  { 'artistName': 'Maximo Park',
    'endTime': '2021-11-29 11:36',
    'msPlayed': 200346,
    

4. Write a function that takes in the month number (1-12) and prints the name of the month.

Look into the `calendar` module to help you.

*Bonus: try doing this using the `datetime` module instead*

In [None]:
import calendar

def get_month_name(month_number):
    return calendar.month_name[month_number]

get_month_name(7)

'July'

In [None]:
def get_month_name_dt(month_number):
    # create a date using the month number
    date = datetime.datetime(2024, month_number, 1)
    return date.strftime("%B")

get_month_name_dt(9)

'September'

5. Write a function that takes in a letter and tells you its position in the alphabet.

Use the `string` module to help you, don't write out an alphabet yourself!

In [None]:
import string

def get_position(letter):
    letter_index = string.ascii_lowercase.find(letter.lower())
    # we should really return the letter's position as 1-indexed
    # so A is 1 not 0
    return letter_index + 1

print(get_position("a"),
      get_position("M"))

1 13


6. Use the `statistics` library to work out both the **mean** and the **median** of the first 1000 integers (so the numbers 1-1000 inclusive).

In [None]:
import statistics

numbers = list(range(1, 1001))

print(statistics.mean(numbers), statistics.median(numbers))

500.5 500.5


<h1 style="color: #fcd805">Exercise: Pub names</h1>

Let's do some data analysis with Python!

We're going to find out what the most common pub name is in the UK.

In the `data` folder is a file containing a database of pubs (originally from https://www.getthedata.com/open-pubs).

1. First, read its contents into a **list** of rows. How many pubs does the file contain?

In [None]:
lines = []

with open("data/open_pubs.csv", "r") as f:
    lines = f.readlines()

print(len(lines))

51331


In [3]:
lines[0]

NameError: name 'lines' is not defined

2. Look at a single row of your list. Write a function to extract just the **name** of the pub based on a single row.

For example, for the input `"22","Anchor Inn","Upper Street, Stratford St Mary, COLCHESTER","CO7 6LW","604749","234404","51.970379","0.979340","Babergh"` the function should return the string `"Anchor Inn"` (without the extra quotation marks)

In [None]:
def extract_pub_name(row):
    values = row.split(",")
    return values[1].replace('"', '')

extract_pub_name('"22","Anchor Inn","Upper Street, Stratford St Mary, COLCHESTER","CO7 6LW","604749","234404","51.970379","0.979340","Babergh"')

'Anchor Inn'

3. Create a new empty list and populate it with pub names by using your function on all of the rows in the data.

In [None]:
pub_names = []

for line in lines:
    pub_names.append(extract_pub_name(line))

print(len(lines), len(pub_names))

51331 51331


In [None]:
pub_names[-1]

'Y Tai'

4. At this point, you should have a list containing only the names of the pubs, corresponding to a single column in the original data file.

We want to make sure we treat different versions of the same pub name as the same thing. There is a mix of lower and upper case strings in the file, so we will standardise this.

We will also remove the word "the" so that a pub called "The King's Head" will be treated as having the same name as one that's simply called "King's Head".

Create a new list, this time containing the pub names in **all uppercase** and with the word `"the"` removed.

*Tip: take care not to replace words that **contain** the word `the` like "theatre"*

In [None]:
uppercase_pubs = []

for name in pub_names:
    upper_name = name.upper()
    uppercase_pubs.append(upper_name.replace("THE ", "")) # the extra space means we don't remove words like "theatre"

print(len(lines), len(uppercase_pubs))

51331 51331


In [None]:
uppercase_pubs[-1]

'Y TAI'

5. Now we're ready to count!

Create an empty dictionary to store pub name counts in. The *keys* will be the names themselves, and the *values* will be the number of pubs with that name. The final result will be a bigger version of something like this:

```python
{
    "KING'S HEAD": 47,
    "BRASENOSE ARMS": 1
}
```

For each pub name you encounter, either:

- add the pub name to the dictionary with a value of 1 (corresponding to the first time we see a pub name)
- if the pub is already in the dictionary, increment the count of the corresponding key

In [None]:
pub_counts = {}

for pub_name in uppercase_pubs:
    if pub_name in pub_counts:
        pub_counts[pub_name] += 1
    else:
        pub_counts[pub_name] = 1

pub_counts

{'ANCHOR INN': 69,
 'ARK BAR RESTAURANT': 1,
 'BLACK BOY': 10,
 'BLACK HORSE': 86,
 'BLACK LION': 31,
 'BREWERS ARMS': 19,
 'BRISTOL ARMS': 2,
 'CAFFEINE LOUNGE': 1,
 'CARRIERS ARMS': 3,
 'CHESTNUT TREE FARM': 1,
 'COCK & BELL': 2,
 'EIGHT BELLS INN': 2,
 'FINEZZA PIZZA': 1,
 'FOX AND HOUNDS': 55,
 'GLEMSFORD SOCIAL CLUB': 1,
 'GROVER AND ALLEN - J D WETHERSPOON': 1,
 'HARE AND HOUNDS': 44,
 'HINTLESHAM & CHATTISHAM SOCIAL CLUB': 1,
 'KINGS HEAD': 130,
 'KINS HEAD INN': 1,
 'LONG MELFORD INN': 1,
 'MALDON GREY': 1,
 'NEEDHAM MARKET BOWLS CLUB': 1,
 'NORTH STREET TAVERN': 1,
 'OYSTER REACH BEAFEATER': 1,
 'PLOUGH & FLEECE': 1,
 'RED LION': 335,
 'ROYAL HARWICH YACHT CLUB': 1,
 'ROYAL OAK': 286,
 'SHOTLEY VINEYARD': 1,
 'SILKWORM': 1,
 'SIX BELLS': 27,
 'SUDBURY INSTITUTE CLUB': 1,
 'SUFFOLK CHEFS LTD': 1,
 'ANGEL GLEMSFORD LTD': 1,
 'ANGEL INN': 51,
 'BAY HORSE': 33,
 'BEAGLE': 3,
 'BREWERY TAP': 13,
 'BROOK INN': 5,
 'BULL': 31,
 'BUSH INN': 13,
 'BUTT & OYSTER': 1,
 'CASE IS ALTERED':

6. Using the dictionary you've just created, find the **most common pub name**. This will be the *key* that corresponds to the highest *value* in the dictionary.

*Hint: as you go through the `items` in the dictionary, keep track of the highest count and replace it every time you encounter a higher one. Be sure to also track the corresponding key, so you know which pub name the highest count belongs to!*

In [None]:
top_count = 0
most_popular_pub = ""

for pub, count in pub_counts.items():
    if count > top_count:
        most_popular_pub = pub
        top_count = count

print(most_popular_pub, top_count)

RED LION 335


*BONUS: can you solve the counting part using something from the `collections` module?*

**https://docs.python.org/3/library/collections.html**

In [None]:
from collections import Counter

Counter(uppercase_pubs)

Counter({'RED LION': 335,
         'ROYAL OAK': 286,
         'CROWN INN': 199,
         'NEW INN': 181,
         'WHITE HART': 167,
         'KINGS ARMS': 157,
         'ROYAL BRITISH LEGION': 149,
         'SHIP INN': 140,
         'KINGS HEAD': 130,
         'PLOUGH INN': 130,
         'QUEENS HEAD': 127,
         'WHITE HORSE': 122,
         'CROWN': 109,
         'ROSE AND CROWN': 108,
         'PRINCE OF WALES': 106,
         'WHEATSHEAF': 103,
         'SWAN INN': 102,
         'BELL INN': 102,
         'RAILWAY INN': 89,
         'SUN INN': 88,
         'PLOUGH': 88,
         'BLACK HORSE': 86,
         'STAR INN': 85,
         'SWAN': 82,
         'WHITE SWAN': 80,
         'WHITE HORSE INN': 79,
         'RISING SUN': 79,
         'CROSS KEYS': 75,
         'THREE HORSESHOES': 72,
         'BULLS HEAD': 71,
         'WHITE LION': 71,
         'NAGS HEAD': 70,
         'GOLDEN LION': 70,
         'MASONS ARMS': 70,
         'ANCHOR INN': 69,
         'GEORGE INN': 68,
        