# Introduction to Python



## Python in the wild

Let's look at some Python tricks and best practices.

If you've ever used a programming language like C++ or Java, you might be more familiar with `for` loops looking like this:

```C++
for(i=0; i<100; i++) {
    my_list[i] // do something with the next items in the list
}
```

In Python, we don't loop through the *indices*, but the items themselves. But what if we want the index as well?

Let's say we have some code that processes orders. We don't just want to process them, we want to print *where we are in the list* so the user knows how the code is doing.

In [None]:
orders = ["AB123", "BC463", "DE853", "FG552", "GH912"]

for order in orders:
    print(order)
    # how do we print something like "processing order 1 of N"?

In [None]:
for idx, order in enumerate(orders):
    # we have TWO loop variables!
    print(idx, order)

Let's fill this in together!

In [None]:
for idx, order in enumerate(orders):
    print(f"Processing order {order} ({} of {})")

SyntaxError: f-string: empty expression not allowed (<ipython-input-1-a49f91481e09>, line 2)

Now we also have a list of customers associated with each order. The first customer references the first order, the second references the second and so on.

In [None]:
customers = ["Karen", "Joe", "Sally", "Robert", "Alex"]

# we could do this:
for idx, order in enumerate(orders):
    customer = customers[idx]
    print(f"Order {order} belongs to {customer}")

NameError: name 'orders' is not defined

In Python there's a better way:

In [None]:
zip(orders, customers)

`zip` takes two lists and combines them item-by-item. We don't see the results because they're in a special `zip` object. This can be looped over, but also converted to a list to preview it:

In [None]:
list(zip(orders, customers))

In [None]:
for order, customer in zip(orders, customers):
    print(f"Order {order} belongs to {customer}")

Another pattern we've seen is creating a list by doing something to items in another:

In [None]:
import string

alphabet = string.ascii_lowercase

uppercase = []
for letter in alphabet:
    uppercase.append(letter.upper())

print(uppercase)

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']


Again, there is a *Pythonic* way of doing this with *comprehensions*.

Comprehensions combine for loops and list creation into one line. You'll see them a lot:

In [None]:
uppercase_2 = [letter.upper() for letter in string.ascii_lowercase]

print(uppercase_2)

Any time you want to change all items in a list to make a new one, remember you can use a comprehension!

### Sets

Another useful data structure is a `set`. It is the same as a set in mathematical set theory.

A `set` is a collection of *only unique items*.

In [None]:
fruit = ["banana", "apple", "pineapple", "banana"]

set(fruit)

In [None]:
type(set(fruit))

Sets allow you to:

- find unique items in a collection
- compare lists to see what items overlap

With lists, how would you find which customer is both in list `A` and `B`?

In [None]:
customers_A = ["Karen", "Joe", "Sally", "Robert", "Walter", "Alex"]
customers_B = ["Murray", "Alex", "Matilda", "Walter"]

We can do it with sets:

In [None]:
set(customers_A).intersection(customers_B)

What about combining all unique names between lists (dropping duplicates)?

In [None]:
set(customers_A).union(customers_B)

Or removing items from list A that are in list B

In [None]:
set(customers_A) - set(customers_B)

<h1 style="color: #fcd805">Exercise: Python collection best practices</h1>

We are going to perform some data analysis on two versions of song lists.

We have streaming data from two different periods and we will look at how listening habits have changed.

1. Read in the contents of `songs.json` as a *single string* into a variable called `songs`, and the contents of `songs2.json` into one called `songs2`.

2. Convert the strings into Python objects using the `json` module.

How many songs are in each list?

3. Write a function that takes a song object (a dictionary) as an input and returns the artist and the song title as a single string, separated by a colon `:`.

For the following dictionary:

```python
{
    "endTime" : "2021-11-22 12:05",
    "artistName" : "The Boomtown Rats",
    "trackName" : "I Don't Like Mondays",
    "msPlayed" : 259093
}
```

the expected output would be:

```python
"The Boomtown Rats: I Don't Like Mondays"
```

Test your function to make sure it works on the above example.

4. Apply your function to all the songs in both lists. To do this, write a `for` loop to go through each song and call your function on each dictionary. Save the "artist: title" strings into new lists.

For example, the following list of songs:

```python
[
    {
        "endTime" : "2021-11-22 12:05",
        "artistName" : "The Boomtown Rats",
        "trackName" : "I Don't Like Mondays",
        "msPlayed" : 259093
    },
    {
        "endTime" : "2021-11-22 12:09",
        "artistName" : "The Darkness",
        "trackName" : "I Believe in a Thing Called Love",
        "msPlayed" : 217653
    }
]
```

should produce the following list:

```python
["The Boomtown Rats: I Don't Like Mondays", "The Darkness: I Believe in a Thing Called Love"]
```

Make sure your code produced similar lists before moving on.

5. Now, rewrite the `for` loop as a *list comprehension*. The result should be the same as in the previous step: a list of "artist: title" strings.

6. Use `set`s to determine which songs are in *both* lists.

7. Finally, how many unique artists are there in each list?

*Tip: think through the individual steps needed to answer this, write them as comments if that's helpful*

# Error handling

What happens in Python when something goes wrong? We get an error message!

In [None]:
def add(first, second):
    return first + second

add(1, 2)

In [None]:
add("3", 3)

Usually the error message itself contains a clue to what went wrong.

How can we deal with errors so they don't stop our program running when they occur?

There are two "schools of thought" around error handling:

Look Before You Leap (LBYL) and Easier to Ask Forgiveness than Permission (EAFP)

Let's see them in action for the previous example:

In [None]:
def add_lbyl(first, second):
    if (isinstance(first, int) or isinstance(first, float)) and (isinstance(second, int) or isinstance(second, float)):
        return first + second
    else:
        print("Both arguments must be numbers!")

In [None]:
add_lbyl(1, 2)

In [None]:
add_lbyl("2", 3)

In [None]:
def add_eafp(first, second):
    try:
        return first + second
    except TypeError:
        print("Both arguments must be numbers!")

In [None]:
add_eafp(1, 2)

In [None]:
add_eafp("2", 3)

You can even return error messages on purpose!

In [None]:
raise ValueError("SOMETHING BAD HAPPENED")

The type of the error should reflect the kind of problem that occurred.

More on this in the [Python exception documentation](https://docs.python.org/3/library/exceptions.html).

For example, selecting an index outside a valid range gives an `IndexError`:

In [None]:
my_list = [2, 4, 6, 8]

my_list[17]

In your `try`-`except` block you can catch specific error types then default to a case where a different error was encountered:

In [None]:
try:
    my_list = [2, 4, 6, 8]
    my_list[17]
except IndexError:
    print("Index out of range!")
except ValueError:
    print("Another kind of error occurred")

There is also the option to add an `else` block for when no exceptions happen, and a `finally` block which runs whether or not an error was encountered previously:

In [None]:
try:
    my_list = [2, 4, 6, 8]
    #my_list[17] # IndexError
    #my_list[1] + "2" # TypeError
    #2 / 0 # ZeroDivisionError
except IndexError:
    print("Index out of range!")
except TypeError:
    print("A TypeError occurred")
except Exception as other:
    print(f"Mystery error: {other} ({type(other)})")
else:
    print("Everything went to plan")
finally:
    print("All done")

One other useful keyword is `assert`. This checks if a condition is true and throws an error if it isn't.

In [None]:
assert 1 == 1

In [None]:
assert 1 == 2

This is useful for sanity checking your code especially while you're still developing it!

More on error handling in Python here: https://docs.python.org/3/tutorial/errors.html

<h1 style="color: #fcd805">Exercise: Error handling</h1>

1. Write a function that takes two numeric arguments and divides one by the other. In your function, have error handling to catch division by zero errors and also errors if either argument passed in is not numeric.

2. Write a function that takes in a file path as an argument, opens the file and prints its contents. Explicitly handle a FileNotFoundError so the code doesn't break if the user supplies an invalid file path.

# Regex

We can manipulate strings and find patterns using Python's built-in string functionality, but we can't do a lot of pattern-matching beyond looking for specific strings (with `.find`, for example)

What if, say, we want to extract all numbers from a long string?

We could solve this with loops, checking every character to see if it's numeric, but this would take a long time and be quite complicated.

There are better ways!

Regular Expressions (regex) allow us to **quickly search for patterns in text**.

You need:

- a module that can do the searching (in Python it's called `re`)
- a string to search
- a pattern to look for

The key component to learning regex is the pattern syntax.

To look for a single number, the pattern is composed of two pieces. First, we specify we're looking for numeric characters: `[0-9]` (anything in the range 0 to 9). By default this looks for **one** instance, so we add a `+` modifier to say "one or more instances of that character".

In [None]:
import re

text = """
On the 12th day of Christmas, my true love sent to me
12 drummers drumming,
11 pipers piping,
10 lords a-leaping,
9 ladies dancing,
8 maids a-milking,
7 swans a-swimming,
6 geese a-laying,
5 golden rings,
4 calling birds,
3 French hens,
2 turtle doves,
And a partridge in a pear tree.
"""

pattern = r"[0-9]+" # "r" means "raw string" so special characters aren't interpreted as special

matches = re.findall(pattern, text)

matches

The `re` module has other functions to look for patterns, but they're more restrictive:

- `search` finds only the *first* match
- `match` finds the pattern if it's at the *start* of the string
- `findall` finds all matches (use this one!)

What are the pros/cons of each approach? Why not always use `findall`?

In [None]:
re.search(pattern, text) # returns position of first match

In [None]:
match = re.match(pattern, text) # returns None if the pattern isn't at the start of the string

type(match)

We can specify much more complex patterns. Let's say we want to find all rows that start with a number and end in a comma, essentially extract the middle of the song.

Our logic is "if a line starts with a number, has any other characters in the middle, and ends with a comma, extract it":

Note: we can use `\d` instead of `[0-9]` to find any digit (this would also includes non-latin number characters where `[0-9]` wouldn't):

- `\d+` to match any number of digits
- `.` means "any character" so `.+` means "one or more of any character"
- `,` to match the comma character literally

In [None]:
new_pattern = r"\d+.+,"

re.findall(new_pattern, text)

Oops, we included the comma at the end (we could remove that manually) but we also included the "12th day of Christmas" part of the first line.

We could modify our pattern to include the "newline" character `\n`:

In [None]:
new_pattern_modified = "\d+.+,\n" # note we don't use "r" because \n is a special character!

re.findall(new_pattern_modified, text)

One small modification is that we can specify which *part* of the pattern to extract by putting that part in brackets:

In [None]:
best_pattern = "(\d+.+),\n" # note we don't use "r" because \n is a special character!

re.findall(best_pattern, text)

_Tip: If you want to explore regex and test your expressions, use https://regex101.com/ (remember to set the "flavor" of regex to Python on the left)_

Regex can also be used for replacement.

Let's say we wanted to replace all numbers with a placeholder like `"[NUMBER]"`, we can use the `.sub` method:

In [None]:
print(re.sub(pattern, "NUMBER", text))

We can also limit this to a certain number of replacements

In [None]:
print(re.sub(pattern, "NUMBER", text, 2))

<h1 style="color: #fcd805">Exercise: Regular expressions</h1>

We're going to use regular expressions to find different strings in song lyrics.

1. The file `jude.txt` contains the lyrics to Hey, Jude! by the Beatles. Read its contents into a single string.

2. In "raw" Python (not using regular expressions) find how many times the name Jude appears in the song.

3. Now, using regular expressions, find how many times the string "nah" appears. Remember to account for different capitalisations!

_Tip: use https://regex101.com/ to test your regex can be helpful before writing your Python code_

4. Write a regular expression to extract a mobile number from a string.

The rules are:

- the string has to start with a 0
- it has to have 11 digits in total

Optional bonus: the string *could* have a hyphen after the first 5 digits, then after another 3, e.g.:

This is valid: 07123-444-111

As is this: 07123444111

Test your expression to see if it finds the phone number in the following strings:

```python
"""
The vicar's phone number in case of emergency is 07333-999-111
"""

"""
The vicar's phone number in case of emergency is 07333999111
"""
```

BONUS:

5. Write an email address validator function in Python. The function should take in as argument a single string, and use regular expressions to verify the following:

- the string should start with one or more alphanumeric characters
- the string should then contain an `@` character
- finally, the string should then contain one or more alphanumeric characters, a `.` character, followed by exactly 3 letters

Examples:

* `hello@python.org` would be a match
* `@python.org` would NOT (no characters before the `@`)
* `hello123@python.org` would be a match
* `hello123@python.123` would NOT be a match because the final section has to be letters

---

<h1 style="color: #fcd805">Exercise: Writing Python scripts</h1>

Write a rock-paper-scissors game!

Requirements:

- the code should be in a `.py` file
- the only input to the code should be either ROCK, PAPER, or SCISSORS
- the computer also randomly picks one of ROCK, PAPER, or SCISSORS
- the rules are:
    - ROCK beats SCISSORS
    - SCISSORS beat PAPER
    - PAPER beats ROCK
    - otherwise it's a draw
- the code should print the winner and exit

Bonus requirements:

- consider an alternative way of inputting a user's move. If you used `input`, try providing the move using a command line argument, like `python game.py ROCK`. Or vice versa!
- keep score. When a game ends, store the updated result in a file.

_Tip: as usual, think about the logical steps one by one before writing any code!_