# Introduction to Python

# Functions

Python comes 'bundled' with built-in functions like `len`, `print`, `sum` etc.

Functions are pieces of logic that can have **0 or more inputs** and **0 or 1 outputs** (multiple outputs are possible but they have to be in a single container like a list or tuple).

In [1]:
def capital_print(message):
    print(message.upper())

In [2]:
print("my message")

capital_print("my message")

my message
MY MESSAGE


This was a function that took one input (called `message`) and performed an operation on it (printed the uppercase version).

It works for *any* input value (as long as it can be uppercased, i.e. a `str` type).

We can also pass a variable to a function:

In [None]:
my_name = "David"

capital_print(my_name)

DAVID


Just like the `print` function it **does something** (prints a message) but it doesn't **output** anything, like a result of a calculation.

## Return values

Consider this function:

In [None]:
def minimum_deposit(house_value):
    deposit = house_value * 0.1
    print(f"The minimum deposit on a house worth {house_value} is {deposit}")
    return deposit

In [None]:
dep=minimum_deposit(150_000) # notice we can split up large numbers with an underscore!

The minimum deposit on a house worth 150000 is 15000.0


In [None]:
dep

15000.0

Not only did this function print a result, it also **returned** it. That means we can use the result of this calculation elsewhere.

In [None]:
my_deposit = minimum_deposit(200_000)
print(my_deposit)
type(my_deposit)

The minimum deposit on a house worth 200000 is 20000.0
20000.0


float

Incidentally, the return type of a function that returns nothing is `NoneType` (like Python's `None` to indicate a missing value)

In [None]:
type(print("Test"))

Test


NoneType

In [None]:
type(capital_print("Test"))

TEST


float

## Multiple inputs

Function can take any number of inputs, including zero.

In [None]:
def meaning_of_life():
    print("The answer to the meaning of life, the universe, and everything is... 42")

In [None]:
meaning_of_life()

The answer to the meaning of life, the universe, and everything is... 42


To take multiple inputs, we just add them as comma-separated values

In [None]:
def calculate_interest(amount, pct_interest):
    interest = amount * pct_interest
    print(f"Interest of {pct_interest} on {amount} is {interest}")
    return interest

In [None]:
my_interest = calculate_interest(1000, 0.43)
print(my_interest)
type(my_interest)

Interest of 0.43 on 1000 is 430.0
430.0


float

Python is **dynamically typed** meaning we don't have to specify what `type` our inputs need to be.

Best practices for writing functions include:

- descriptive function name and variable names
- writing documentation to let users know how to use your function
- checking for erroneous values, such as whether values that should be numbers are actually numbers

### Docstrings

Docstrings in Python are strings inside a function that document its workings. It is written by the creator of the function.

More information about docstrings: [Python Docstring Conventions](https://peps.python.org/pep-0257)

In [None]:
def calculate_interest(amount, pct_interest):
    """Calculate the amount of interest based on an amount and an interest rate.

    Arguments:
    `amount` - the total amount
    `pct_interest` - the interest rate, as a percentage expressed as a decimal between 0 and 1
    """
    interest = amount * pct_interest
    print(f"Interest of {pct_interest} on {amount} is {interest}")
    return interest

Now you can get "built-in" help!

In [None]:
help(calculate_interest)

Help on function calculate_interest in module __main__:

calculate_interest(amount, pct_interest)
    Calculate the amount of interest based on an amount and an interest rate.
    
    Arguments:
    `amount` - the total amount
    `pct_interest` - the interest rate, as a percentage expressed as a decimal between 0 and 1



## Returning multiple values

Technically, Python allows for exactly **one** returned value. However, this can be a collection of values, like a tuple, list, or dictionary.

In [None]:
def name_variants(name):
    return name, name.lower(), name.upper()

In [None]:
name_variants("David")

('David', 'david', 'DAVID')

In [None]:
a,b,c=name_variants("David")

In [None]:
a

'David'

In [None]:
b

'david'

In [None]:
c

'DAVID'

Notice that a series of comma-separatede values in Python is a `tuple` by default.

`return name, name.lower(), name.upper()` is the same as `return (name, name.lower(), name.upper())`

We could have returned those values as a list, or a dictionary, or anything else.

<h1 style="color: #fcd805">Exercise: Functions</h1>

1. Define a function that calculates and returns the area of a circle based on its radius ($\pi r ^{2}$).

2. Define a function that calculates and returns **both** the area *and* the circumference ($2 \pi r$) of a circle based on its radius.

3. Define a function to calculate and return the **future value of a used car** based on its current value, a percentage depreciation value, and the number of years.

The formula to use is $future\_value = current\_value \times (1 - depreciation\_rate)^{years}$

e.g. a car worth 100,000 with 10% depreciation is worth 90,000 after 1 year, 81,000 in year 2, and 72,900 in year 3

4. Define a function to calculate this same future value, but return **all intermediate values as a list**.

For the previous example, a car worth 100,000 with 10% depreciation over 3 years should return `[90000, 81000, 72900]`

## Keyword and positional arguments

We saw that previously when we *used* a function, we didn't have to specify which values correspond to which argument.

In the `calculate_interest(amount, pct_interest)` function we just **knew** that the first argument had to be the amount, and the second had to be the interest. In this usage, we call them **positional arguments** (we specify them based on the *position* in the original function definition).

We could have been explicit and named which argument we're passing values for. This is called a **keyword argument**, and it lets us mix the order of arguments:

In [None]:
calculate_interest(amount=100_000, pct_interest=0.2)

Interest of 0.2 on 100000 is 20000.0


20000.0

In [None]:
calculate_interest(pct_interest=0.2, amount=100_000)

Interest of 0.2 on 100000 is 20000.0


20000.0

**Note: once you start using keyword arguments, you *must* keep using them!**

In [None]:
calculate_interest(amount=100_000, 0.02)

SyntaxError: positional argument follows keyword argument (<ipython-input-28-99c5bed13c61>, line 1)

In [None]:
calculate_interest(amount=100_000, pct_interest=0.02)

Interest of 0.02 on 100000 is 2000.0


2000.0

However, using keyword arguments *after* positional arguments is fine:

In [None]:
calculate_interest(100_000, pct_interest=0.02)

Interest of 0.02 on 100000 is 2000.0


2000.0

### Default arguments

Arguments can also have defaults that can be overwritten.

Let's revisit the mortgage example and imagine that most banks require a 10% deposit so we defaulted to this in our function.

However, we want to be able to specify the mortgage rate but only if it's not 10%.

This can be done with a default argument value:

In [None]:
def minimum_deposit(house_value, rate=0.1):
    deposit = house_value * rate
    print(f"The minimum deposit on a house worth {house_value} is {deposit} ({rate*100}%)")
    return deposit

In [None]:
minimum_deposit(100_000)

The minimum deposit on a house worth 100000 is 10000.0 (10.0%)


10000.0

In [None]:
minimum_deposit(100_000, 0.15)

The minimum deposit on a house worth 100000 is 15000.0 (15.0%)


15000.0

## Unlimited/unknown arguments

What if you want a variable number of arguments in your function?

### Example 1: unknown input size

Currently the built-in `sum` function in Python requires a collection like a list to work:

In [None]:
sum([1, 2, 3, 4, 5])

15

What if we wanted to create a sum function that doesn't require the items to be in a list?

This would mean we need to anticipate the number of values up front!

We could do something like this:

```python
def my_sum(value1, value2, value3=0, value4=0):
```

but we'd have to add a lot of empty values to account for people wanting to sum a lot of numbers!

Clearly there must be a better way to do this... and there is!

In [None]:
def my_sum(*args):
    total = 0
    for number in args:
        total += number
    return total

In [None]:
my_sum(1, 2, 3, 4, 5,7,6,8,9) # items don't need to be in a list, we can pretend they're all individual arguments

45

In this case, `args` is still a list of values, but we don't have to specify them as a list.

In [None]:
def inspect_args(*args):
    print(type(args))
    print(args)

inspect_args(1, 2, 3, 4, 5)

<class 'tuple'>
(1, 2, 3, 4, 5)


Using `*args` we can have a **variable** number of *positional* arguments.

A key underpinning of this functionality is the idea of **unpacking** where we can take individual items in a list and assign them to multiple variables at once.

In [None]:
names = ["Arthur", "C", "Clarke"]

first_name = names[0]
middle_initial = names[1]
surname = names[2]

print(first_name, middle_initial, surname)

Arthur C Clarke


In [None]:
names = ["Arthur", "C", "Clarke"]

first_name, middle_initial, surname = names

print(first_name, middle_initial, surname)

The `*args` in Python is a way to indicate that positional arguments should be "packed" into a single tuple.

We can also have a variable number of *keyword* arguments using `**kwargs`.

For example, a function that can print a person's details, no matter what they are:

In [None]:
def print_details(name, age, **kwargs):
    print(f"Person: {name}")
    print(f"Age: {age}")

    for key in kwargs: # using a for loop on a dictionary loops through the keys!
        print(f"{key}: {kwargs[key]}")



In [None]:
print_details("Alice", 30)


Person: Alice
Age: 30


In [None]:
print_details("Alice", 30, dogs_name="Bingo", occupation="Firefighter")

Person: Alice
Age: 30
dogs_name: Bingo
occupation: Firefighter


With `**kwargs`, all *keyword* arguments are packed into a **dictionary**, regardless of how many there are.

You can check if a particular keyword argument has been passed by checking if the key is in the dictionary:

```python
if "my_key" in kwargs:
    ...
```

If you ever don't know exactly how many arguments you'll need, remember `*args` and `**kwargs`!

<h1 style="color: #fcd805">Exercise: Function arguments</h1>

1. Write a function to check someone in at the airport. The inputs to the function should be the person's name, their date of birth, and then additional keyword arguments where they can specify document numbers (e.g. passport or national ID). Use `**kwargs` for this.

The function should print a message saying the user has successfully checked in **only** if they provided a passport number OR a national ID number. Check the contents of `kwargs` for these inside the function.

The function usage should be something like this:

```python
check_in(name="David", dob="1970-01-01", passport_number="123")
```

and an unsuccessful check in looks like this (no document IDs provided):

```python
check_in(name="David", dob="1970-01-01")
```

2. Write a function to validate that all *positional* arguments passed in are numeric. Use `*args` for this and check the values inside `args` one at a time. The function should print a success message if all the arguments passed in are numeric.

To keep things simple, we'll assume anything that's an `int` or a `float` is numeric, otherwise it's not.

In Python, to check if an object is a certain type, use `isinstance`:

```python
isinstance(1, int) # this is True
isinstance("1", float) # this is False
```

# Modules

In reality, we don't usually have to reinvent the wheel when it comes to functionality.

For most tasks there is probably already a built-in way to do it!

Functionality (like helpful functions) can be bundled together in a **module**.

Python comes with lots of existing modules pre-installed, called **the standard library**.

Here is the entire Python standard library: **https://docs.python.org/3/library/index.html**. There's a lot there - bookmark the page!

---

Let's say we wanted to simulate a dice roll. How do we introduce randomness? Python has a library for that!

To use a library, we need to:

- **install** the library in our Python environment
- **import** the library so our current code file is aware of it

We only need to install a library once, but we need to import it in every file we want to use it in.

In [None]:
import random

In [None]:
def dice_roll():
    return random.randint(1, 6)

In [None]:
dice_roll()

3

The `random` module can also randomly choose from a list

In [None]:
numbers = [2, 4, 6, 8, 10]
random.choice(numbers)

4

Or shuffle a list

In [None]:
print(numbers)

random.shuffle(numbers)

print(numbers)

[2, 4, 6, 8, 10]
[2, 8, 6, 4, 10]


We could also give the library an **alias**. Some libraries have a preferred way of importing them, e.g. `import pandas as pd`

In [None]:
import random as ran

ran.randint(1, 6)

2

In both cases, notice how we imported the `random` module first, and then referred to it every time we wanted to use one of its functions.

We could have also done this:

In [None]:
from random import randint

# now we can use randint as if it was just a built-in function
randint(1, 6)

6

The third (and BAD) option is to just import everything from a library directly.

In [None]:
from random import *

In [None]:
randrange(1, 6) # like randint, but the last number is EXCLUDED so (1, 6) is actually the integers 1-5, like range()

This is bad because function names might conflict with each other if lots of functions are imported from lots of libraries.

Best practice is to **either**:

- import just the function(s) you need
- import just the library and use the prefix everytime (like `random.randint`). This also makes it clear where the functions come from!

<h1 style="color: #fcd805">Exercise: Secret santa</h1>

Let's use everything we've learned so far to create a Secret Santa tool.

Secret Santa is where people put their names in a hat, then everyone draws a name. The name they draw is the person they secretly buy a present for.

#### Rules

- everyone only has one secret santa
- a person cannot be their own secret santa
- the number of participants must be greater than 2

#### Output

The tool should be a single function which:

- takes in a list of names
- outputs the secret santa pairings while observing the above rules

Outputs can be anything: tuples, a list of lists, a dictionary, or even just printed messages.

For example, a list like `["David", "Jeff", "Alice", "Martha"]` could produce a list like:

`[("David", "Jeff"), ("Jeff", "Martha"), ("Alice", "David"), ("Martha", "Alice")]`

#### Optional bonuses

- allow for any number of participants
- error handling (print error messages if any of the above rules are broken)

*Hint: as always, break down the problem into small pieces. Start with the function definition, then write out the logic in plain English using comments to clarify the process in your mind, then fill in the blanks with Python code*

# The Standard Library Tour

Let's look at some helpful examples in the standard library.

https://docs.python.org/3/library/index.html

### Dates

One common task is to work with dates. There is a lot of helpful functionality in the `datetime` library:

In [None]:
import datetime

One confusing aspect is that `datetime` contains date, time, and datetime (which has both date and time components) functionality, so you will often see `datetime.datetime`:

In [None]:
datetime.datetime.today()

`datetime` is technically a *submodule* of the `datetime` module.

This is where it helps to import just what you need, so this is quite common:

In [None]:
from datetime import datetime

In [None]:
datetime.today()

We can calculate differences between dates:

In [None]:
diff = datetime(2024, 1, 1) - datetime(2023, 1, 1)

type(diff)

In [None]:
diff

In [None]:
diff.days

We can also convert strings to dates and vice versa.

For this, we need to specify the *format* the date comes in (which parts are the year, month, day etc.).

Here's a useful reference: **https://strftime.org**

In [None]:
datetime.strptime("2024-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")

If we have a datetime object, we can also print it in whatever format we like:

In [None]:
today = datetime.today()

today.strftime("%A %d %B")

### JSON

A common data storage format is Javascript Object Notation (JSON).

This is an example of JSON:

```json
{
    "endTime" : "2021-12-02 11:19",
    "artistName" : "David Bowie",
    "trackName" : "Rebel Rebel - 2016 Remaster",
    "msPlayed" : 274746
}
```

Look familiar?

We can use the `json` library to convert a string that looks like a Python object to its Python representation:

In [None]:
import json

song_string = """{
    "endTime" : "2021-12-02 11:19",
    "artistName" : "David Bowie",
    "trackName" : "Rebel Rebel - 2016 Remaster",
    "msPlayed" : 274746
}"""

song = json.loads(song_string)

song

In [None]:
type(song)

In [None]:
song["artistName"]

Likewise, you can convert a Python object to its JSON representation (depending on the object):

In [None]:
phone_numbers = [
    {
        "David": "123-4567",
        "Jenny": "867-5309",
        "Simon": "000-0000"
    }
]

numbers_string = json.dumps(phone_numbers)
numbers_string

In [None]:
type(numbers_string)

Not everything can be converted to JSON though. It has to be explicitly "serializable":

In [None]:
json.dumps(datetime.today())

But it works for many use cases that don't require complex objects!

### File I/O

Let's look at another useful library, `os`. One of the things it lets us do is traverse/manipulate the file system.

A common use case is listing all files in a directory:

In [None]:
import os

os.listdir("data")

[]

These give us file names, but not full paths. We can use the `os.path` submodule to get full filepaths:

In [None]:
current_dir = os.getcwd()
current_dir

'/content'

In [None]:
data_files = os.listdir("data")

for file in data_files:
    full_path = os.path.join(current_dir, "data", file)
    print(full_path)

/content/data/.ipynb_checkpoints
/content/data/demo.txt


We can check if a file exists:

In [None]:
data_file = os.path.join(current_dir, "data", "demo.txt")

os.path.exists(data_file)

True

## Handling files

How do we actually open the file?

Python has a handy `open` function for us (no module import required):

In [None]:
with open(data_file, "r") as f:
    lines = f.readlines()
    print(type(lines))
    print(lines)

In [None]:
with open(data_file, "r") as f:
    contents = f.read()
    print(type(contents))
    print(contents)

This opens the file in "read mode" (`"r"`) and gives us a file object `f` that we can manipulate.

The `with` statement lets us do all our file manipulation knowing that Python will "close" the file afterwards.

Otherwise, we'd have to remember to call `f.close()` every time (and if we forget we might "lock" the file making it temporarily unusable!).

So a typical pattern is:

In [None]:
names = []

with open(data_file, "r") as f:
    names = f.readlines()

print(names)

Every line ends with a `\n`, a newline character, like the `Enter` key has been pressed. We could of course strip these out.

In [None]:
names[0]

In [None]:
names[0].replace("\n", "")

In [None]:
names[0].strip()

In [None]:
cleaned_names = []

for name in names:
    cleaned_names.append(name.strip())

cleaned_names

Writing to a file only requires us to change the mode to "write" and calling the appropriate methods:

In [None]:
capital_names = []

for name in names:
    capital_names.append(name.upper())

capital_names

In [None]:
with open("data/capital_names.txt", "w") as f:
    f.writelines(capital_names)

<h1 style="color: #fcd805">Exercise: The Standard Library</h1>

Time to practise using some of the standard library!

1. "Unix time" is a measure of how much time has elapsed since the 1st of January, 1970. Use the `datetime` library to calculate how many days have elapsed in Unix time.

2. There is a file in the `data` folder called `songs.json`, but how many songs does it contain?

- Read its contents into a single string
- Use the `json` module to convert the string to a Python object
- Count the songs!

3. Investigate the `pprint` module to print a nice representation of the songs in the `songs.json` file.

Experiment with the different options and compare how your result looks vs. simply printing the songs.

4. Write a function that takes in the month number (1-12) and prints the name of the month.

Look into the `calendar` module to help you.

*Bonus: try doing this using the `datetime` module instead*

5. Write a function that takes in a letter and tells you its position in the alphabet.

Use the `string` module to help you, don't write out an alphabet yourself!

6. Use the `statistics` library to work out both the **mean** and the **median** of the first 1000 integers (so the numbers 1-1000 inclusive).

<h1 style="color: #fcd805">Exercise: Pub names</h1>

Let's do some data analysis with Python!

We're going to find out what the most common pub name is in the UK.

In the `data` folder is a file containing a database of pubs (originally from https://www.getthedata.com/open-pubs).

1. First, read its contents into a **list** of rows. How many pubs does the file contain?

2. Look at a single row of your list. Write a function to extract just the **name** of the pub based on a single row.

For example, for the input `"22","Anchor Inn","Upper Street, Stratford St Mary, COLCHESTER","CO7 6LW","604749","234404","51.970379","0.979340","Babergh"` the function should return the string `"Anchor Inn"` (without the extra quotation marks)

3. Create a new empty list and populate it with pub names by using your function on all of the rows in the data.

4. At this point, you should have a list containing only the names of the pubs, corresponding to a single column in the original data file.

We want to make sure we treat different versions of the same pub name as the same thing. There is a mix of lower and upper case strings in the file, so we will standardise this.

We will also remove the word "the" so that a pub called "The King's Head" will be treated as having the same name as one that's simply called "King's Head".

Create a new list, this time containing the pub names in **all uppercase** and with the word `"the"` removed.

*Tip: take care not to replace words that **contain** the word `the` like "theatre"*

5. Now we're ready to count!

Create an empty dictionary to store pub name counts in. The *keys* will be the names themselves, and the *values* will be the number of pubs with that name. The final result will be a bigger version of something like this:

```python
{
    "KING'S HEAD": 47,
    "BRASENOSE ARMS": 1
}
```

For each pub name you encounter, either:

- add the pub name to the dictionary with a value of 1 (corresponding to the first time we see a pub name)
- if the pub is already in the dictionary, increment the count of the corresponding key

6. Using the dictionary you've just created, find the **most common pub name**. This will be the *key* that corresponds to the highest *value* in the dictionary.

*Hint: as you go through the `items` in the dictionary, keep track of the highest count and replace it every time you encounter a higher one. Be sure to also track the corresponding key, so you know which pub name the highest count belongs to!*

*BONUS: can you solve the counting part using something from the `collections` module?*

**https://docs.python.org/3/library/collections.html**