# Python Basics

In this notebook, we will recap some of the basics of using Python. We will see variables and lists, functions and methods, as well as how to control the flow of our programmes using if statements and for loops etc. This will also give us an opportunity to explore the differences between Spyder and Jupyter Notebook.

Throughout, you are encouraged to alter the code as you go and experiment. If a question naturally arises, try to answer it. You will get much better at Python if you allow yourself to pursue your curiosity. For those of you who have already worked with Python (and are happy to do so), this will likely be unnecessary. However, it might be worth skimming through the notebook to ensure there is nothing unexpected. It will be assumed throughout the course that you are happy with everything in this notebook.

To run a cell, use CTRL + Enter. Finally, while this lab is being run in Jupyter Notebook, you are expected to create and run your files in Spyder. Using Notebook is simply an opportunity to show you Jupyter, but the goal is for you to work with Spyder long-term.

## 1.1 Why Python?

Python is a popular and powerful open source _interpreted_ programming language. By interpreted language, we mean one that does not require to be _compiled_ into machine code by the user. Instead, this is all done behind the scenes. Although compiled languages like C are usually much faster than interpreted languages, they are arguably less user friendly and require more low level programming to develop complex data processing algorithms. That said, Python provides easy access to powerful computational functions that do interface with underlying C-compiled libraries.

While Python was not specifically designed with data analysis or scientific computing in mind, it lends itself excellently to these tasks, not least because of its ability to work with and visualise large data sets. This ability stems from a large and active ecosystem of third-party packages:

- NumPy - for manipulation of homogeneous array-based data;
- pandas - for manipulation of heterogeneous and labelled data;
- SciPy - for common scientific computing tasks;
- Matplotlib - for publication-quality visualisations;
- IPython - for interactive execution and sharing of code;
- Scikit-Learn - for machine learning.

We will see all of these in this course. In particular, we are seeing IPython now. It is an interactive shell that is built with Python. To quote the project website, it provides a 'rich toolkit to help you make the most out of using Python'. As such, we do not need to alter our code since the IPython shell runs our Python code just like normal, except it has a few extra features (see https://ipython.readthedocs.io/en/stable/interactive/tutorial.html). As for Jupyter, it got its start as an IPython shell and has since grown into a project aiming to provide a 'tool for the entire life cycle of research computing'. These are the words of Fernando Perez who created the IPython programming environment in 2001. We can think of these two as follows: if Python is the engine of our data science task, Jupyter is the interactive control panel. Finally, Jupyter Notebook is a browser-based environment that is perhaps the most familiar interface provided by the Jupyter project. It is useful for development, collaboration, sharing and even publication of data science results.

Before getting down to business, let's start by seeing the most famous of all starting points in programming.

In [1]:
print('Hello, world!')

Hello, world!


Copy this code into a file named _hello\_world.py_ and run it in Spyder.

## 1.2 Variables, Lists and Dictionaries

### 1.2.1 Variables

A variable is typically described as a 'box' in which we can store values. While this is a good way to think of variables in the beginning, it is perhaps better long term to think of variables as labels that we can assign to values. We can also say that a variable references a certain value. While this distinction probably won't matter in the beginning, it is worth learning sooner rather than later since, eventually, there will be unexpected behaviour from a variable if we have an innacurate understanding of how variables work.

When naming a variable, we need to adhere to a few rules and guidelines. Breaking rules will cause errors, while breaking guidelines may just lead to messy code. For the former we briefly consider the following:

- Variable names can contain only letters, numbers and underscores.
- Variable names can start with a letter or an underscore, but not with a number, e.g. _message\_1_ but not _1\_message_.
- Spaces are not allowed in variable names, e.g. _greeting\_message_ but not _greeting message_.
- Avoid using Python keywords and function names as variable names; that is, do not use words that Python has reserved for a particular programmatic purpose, such as `print`.

In [2]:
message = 'Hello, world!'
print(message)

Hello, world!


Every variable is connected to a _value_, which is the information associated with that variable. In the above, we added a variable named _message_ and the value associated to it is the string 'Hello, world!'. Adding a variable is a little more work for the Python interpreter since it first processes the first line, associating the variable _message_ with the text 'Hello, world!'. It then processes the second line, printing the value associated with _message_.

Most programmes define and gather data of some sort before doing something useful with that data. As such, it helps to classify different types of data. The main data types are as follows:

- A _string_ (`str`) is a series of characters and is written inside quotes (single or double, but be consistent!).

`'This is a string'`

`"This is also a string"`
- A _integer_ (`int`) is a signed (i.e. positive or negative) whole number of any size.
- A _floating point value_ (`float`) is a real number (really, we should say an approximation of a real number) characterised by an integer value followed by a decimal point and the fractional part. These can be written in scientific notation (e.g. `a = 1.0e6` to represent one million, or simply `b = 2.0` to represent the whole number two). These are called _floating_ points because, as far as the computer is concerned, the decimal point can appear anywhere in the string of digits.
- A _boolean value_ (`bool`) is simply `True` or `False` (sometimes represented by the integers 1 and 0, respectively).
- A _null value_ (`None`) is a special Python data type. The `NoneType` represents no, or null, value. It has various applications but can be useful as a placeholder when the exact value of a variable is not yet known or defined.

__N.B.__ Some languages require us to define the type of variable before assigning a value (or a piece of data) to it. While Python does not require this, it is still important to be aware of the data type.

In [3]:
a, b = 1, 2.0    # This is how we can declare multiple variables on the same line

# If we are unsure of an object's data type, we can simply enquire.
print(type(a))
print(type(b))
print(type(message))

# We can change a data type
a = float(a)
print(type(a))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'float'>


In [4]:
# But we cannot cast a string as a float
message_float = float(message)

# There are two exceptions to this: the strings 'nan' and 'inf'

ValueError: could not convert string to float: 'Hello, world!'

The special cases 'nan' and 'inf' stand for 'not a number' and 'infinite', respectively. A nan (or NaN) allows for the representation of a floating point data type that is unrepresentable or undefined. In data science, these often represent missing or corrupt data. An inf allows for the representation of an infinite value.

In some situations, we want to use a variable's value inside a string. For example, we might have two variables, one representing a first name and one representing a last name. We want to combine these values to display a person's full name.

In [5]:
first_name = 'Ada'
last_name = 'Lovelace'
full_name = f'{first_name} {last_name}'
print('Hello,',full_name)    # Use a comma to print more than one input

Hello, Ada Lovelace


Save the above code as _name.py_.

To insert a variable's value into a string, we place the letter `f` immediately before the opening quotation mark and then put braces around the name or names of any variable we want to use inside the string. Python will then replace each variable with its value when the string is displayed. These strings are called _f-strings_ where the _f_ is for _format_ because Python formats the string by replacing the name of any variable in braces with its value.

There are lots of ways to template or format a string. We will briefly discuss the mechanics of one of the main interfaces here. String objects have a `format` method that can be used to substitute formatted arguments into the string, producing a new string. Below, we have the following:
- `{0:.2f}` means to format the first argument as a floating-point number with two decimal places.
- `{1:s}` means to format the second argument as a string.
- `{2:d}` means to format the third argument as an exact integer.

In [6]:
template = '{0:.2f} {1:s} are worth US${2:d}'

To substitute arguments for these format parameters, we pass a sequence of arguments t the `format` method.

In [7]:
template.format(4.55560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

Sometimes we want to take in a string and 'spit out' a list whose entries are the words in the string. This can be done using the `.split()` method.

In [8]:
string = 'This is a test string. We can turn this into a list.'

In [9]:
string.split()

['This',
 'is',
 'a',
 'test',
 'string.',
 'We',
 'can',
 'turn',
 'this',
 'into',
 'a',
 'list.']

By default, this method splits based on a space. However, we can split along letters or punctuation simply by passing an extra argument.

In [10]:
string.split('.')

['This is a test string', ' We can turn this into a list', '']

In [11]:
string.split('e')

['This is a t', 'st string. W', ' can turn this into a list.']

To learn more, refer to https://docs.python.org/3/library/string.html

__Exercises:__
- Update your _hello\_world.py_ file to contain the variable _message_, as above. Then alter your code to read `print(mesage)`. Run your code and observe what happens. Next, change the value of the variable and run the code again.
- Explore the use of `.title()`, `.upper()` and `.lower()` on strings. What do they do? Try using them in _name.py_.
- Explore the use of \t and \n in adding whitespace to strings. Again, try using this in _name.py_ or _hello\_world.py_.
- Explore the use of `.rstrip()`, `.lstrip()` and `.strip()` in removing whitespace from strings. As ever, use these in either of your _.py_ files.

### 1.2.2 Lists

Lists allow us to store sets of information in one place, whether it is just a few items or millions. These are one of Python's most powerful features and tie together many important concepts in progamming. The simplest way to create a list is to simply type out each item of the list, enclosed in square brackets ([ ])

In [12]:
# An empty list
an_empty_list = []

# A non-empty list
discworld = ['Death', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes']

# Print the list
print(discworld)

# Access individual elements of the list
print('The first element is', discworld[0])
print('The thid element is', discworld[2])
print('The last element is', discworld[-1].title())

# A list with a mixture of data types
list1 = [1, 'a', 2, 'b', 3, 'c']

['Death', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes']
The first element is Death
The thid element is The Luggage
The last element is Sam Vimes


A list can even have another list as an item. Lists have different 'methods' which allow us to manipulate the data inside them. An exhaustive overview can be found in https://docs.python.org/3/tutorial/datastructures.html

An important method from a practical standpoint is the 'append' method. This allows us to add items to lists and does so by modifying the list as opposed to creating a new one. This is often called an 'in place' method.

We can also use 'extend' to extend one list with another or even add a new element at a specific position in our list using 'insert'. For the latter, we need to specify the position and what we want to insert as two different arguments.

In [13]:
discworld.append('Cohen the Barbarian')
print(discworld)

discworld.insert(1,'Ysabel')
print('\n\t',discworld)

another_list = [4, 'd', 5, 'e', 6, 'f']
list1.extend(another_list)
print('\n\t\t',list1)

['Death', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes', 'Cohen the Barbarian']

	 ['Death', 'Ysabel', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes', 'Cohen the Barbarian']

		 [1, 'a', 2, 'b', 3, 'c', 4, 'd', 5, 'e', 6, 'f']


We can do much more with lists than simply adding (or removing - see exercises below) items. We can also determine the length of a list easily using the `len()` function, or sort using the `sort()` method. If we wanted to maintain the original order of a list but present it in a sorted order, we can instead use the `sorted()` function. This lets us display our list in a particular order but does not affect the actual order of the list.

In [14]:
n = len(discworld)
discworld.sort()
print('There are',n,'characters in the Discworld list. In alphabetical order, they are as follows:\n\n', discworld)

There are 8 characters in the Discworld list. In alphabetical order, they are as follows:

 ['Cohen the Barbarian', 'Death', 'Granny Weatherwax', 'Rincewind', 'The Librarian', 'The Luggage', 'Ysabel', 'sam vimes']


__Exercises__
- Save the discworld list as _discworld.py_. Change each name to one written entirely in lowercase. Then, print a message which greets each member (using titled capitalisation).
- Explore the `del` statement and use it to remove 'Rincewind' from the discworld list.
- Compare `del` to the `pop()` method. How does it differ?
- Compare `del` and `pop()` to the `remove()` method. How does it differ?
- Explore the `.remove()` method.
- Create a file called _dinner\_guests.py_. In this file you must define a list of dinner guests you would like to invite to a dinner. Print a message indicating the number of dinner guests invited. Then, add two more guests and remove one of the original guests. Print an updated number of quests to be invited. Finally, print messages to each dinner guest inviting them to dinner.


### 1.2.3 Dictionaries

Python dictionaries allow us to connect pieces of related information. More specifically, a dictionary is a special type of sequence that contains items labelled by 'keys'. We define dictionaries by curly brackets (as opposed to square brackets for lists or parentheses for tuples, which are left as an exercise to explore). These can be used to store information more accurately. For example, we might want to store information about an individual person (name, age, address, profession, and so on). We can do this using the person's identifier as the key.

In [15]:
empty_dict = {}

dict1 = {'a' : 'some value', 'b' : [1,2,3,4]}

dict1

{'a': 'some value', 'b': [1, 2, 3, 4]}

`dict1` has keys 'a' and 'b'. If we want to get a list of these keys, we can use the `.keys()` method. We can then easily add keys to the dictionary.

In [16]:
keys = dict1.keys()
print(keys)

dict1['c'] = (1,2)
print(dict1)

dict_keys(['a', 'b'])
{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (1, 2)}


We can check if a dictionary contains a key using the same syntax used for checking whether a list (or tuple) contains a value:

In [17]:
'b' in dict1

True

In [18]:
'd' in dict1

False

We can delete values either using the `del` keyword or the `pop` method (which simultaneously returns the value and deletes the key):

In [19]:
dict1[5] = 'some value'
dict1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (1, 2), 5: 'some value'}

In [20]:
dict1['dummy'] = 'another value'
dict1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 'c': (1, 2),
 5: 'some value',
 'dummy': 'another value'}

In [21]:
del dict1[5]
dict1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (1, 2), 'dummy': 'another value'}

In [22]:
ret = dict1.pop('dummy')
ret

'another value'

In [23]:
dict1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (1, 2)}

The `.keys()` and `.values()` methods give us iterators of the dictionary's keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order.

In [24]:
list(dict1.keys())

['a', 'b', 'c']

In [25]:
list(dict1.values())

['some value', [1, 2, 3, 4], (1, 2)]

We can merge one dictionary into another using the `.update()` method. This changes dictionaries in-place, so any existing keys in the data passed to `update` will have their old values discarded.

In [26]:
dict1.update({'b': 'abc', 'c': 12})
dict1

{'a': 'some value', 'b': 'abc', 'c': 12}

A set in Python is a sequence of unique elements.

In [27]:
some_list = [1, 1, 1, 2, 3, 4, 5, 5, 6, 6, 6, 7]
some_set = set(some_list)
print(some_set)

{1, 2, 3, 4, 5, 6, 7}


By creating a set from the list, we have efficiently found all the unique elements in it. We could then turn this back into a list object.

In [28]:
some_list  = [1,1,1,2,3,3,4,5,5,6,6,6,7]
some_set = set(some_list)
unique_list = list(some_set)

print(some_set)
print(unique_list)

{1, 2, 3, 4, 5, 6, 7}
[1, 2, 3, 4, 5, 6, 7]


In general we can define a set between curly brackets, but must be careful not to get confused with dictionaries. This is particularly dangerous when wanting to create the empty set. If we naively wrote `{ }` we would in fact end up with an empty dictionary.

__Exercise__
- Use a dictionary to store information about a person you know. Store their first name, last name, age and the city in which they live. You should have keys such as _first\_name_, _last\_name_, _age_ and _city_. Print each piece of information stored in your dictionary.

## 1.3 Binary Operations and Logical Comparisons

Most of the binary mathematical operations and logical comparisons are as we might expect. A list is as follows:

- `a + b` means 'add `a` and `b`'
- `a - b` means 'subtract `b` from `a`'
- `a * b` means 'multiply `a` by `b`'
- `a / b` means 'divide `a` by `b`'
- `a // b` means 'floor divide `a` by `b`', i.e. we drop any fractional remainder
- `a ** b` means 'raise `a` to the power of `b`'
- `a & b` means 'True if both `a` and `b` are True; for integers we take the bitwise `AND`'
- `a | b` means 'True if either `a` or `b` are True; for integers, take the bitwise `OR`'
- `a ^ b` means 'True if `a` or `b` are true, but not both (for booleans); for integers, take the bitwise `Exclusive OR`'

In [29]:
5-7

-2

In [30]:
(42+17)/3

19.666666666666668

In [31]:
4/2    # Note that when you divide any two numbers, the output is always a float, even if it is an integer

2.0

In [32]:
2.0 * 3    # Mixing integers and floats results in a float

6.0

In [33]:
a = 'this is the first half'
b = 'and this is the second half'
print(a + b)

this is the first halfand this is the second half


In [34]:
print (a + ' ' + b)

this is the first half and this is the second half


In [35]:
7<(1/3)

False

We can also compare statements with any of `<`, `<=`, `>`, `>=`, `=`, `==`. Most of these are self explanatory but perhaps it is best to discuss the last two. First, `=` is used to assign a value. So, for example, `a = [1, 2, 3]` is taking the list `[1, 2, 3]` and assigning it to `a`. If we want to use equality to compare two values (as we would in real-world mathematics), we instead use `==`. So, `a == b` would be `True` if both objects have the same value (they are equal). A related (and subtly different) idea is the `is` keyword. If we want to check if two references refer to the same object, then we can use the `is` keyword or also the `is not` keyword if we want to check if two objects are the same.

__Question:__ So what is the difference between `==` and `is`?

__Answer:__ The `==` operator compares the value or equality of two objects, whereas the Python `is` operator checks whether two variables point to the same object in memory. In general, `is` is faster than `==`, but as they are different checks, care must be taken.

In [36]:
a = [1, 2, 3]

print(a)

[1, 2, 3]


In [37]:
b = a

print(b)

[1, 2, 3]


In [38]:
c = list(a)

print(c)

[1, 2, 3]


In [39]:
a is b

True

In [40]:
a == b

True

In [41]:
a is not c

True

In [42]:
a != c

False

In [43]:
a = None

a is None

True

In [44]:
car = 'toyota'
Car = car.upper()

print(car)
print(Car)

toyota
TOYOTA


In [45]:
car == Car

False

In [46]:
car == 'toyota'

True

In [47]:
car != 'ford'

True

In [48]:
age_0 = 21
age_1 = 42

age_0 >= 18 and age_1 >= 21    # We can also check multiple conditions

True

In [49]:
discworld = ['Death', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes']

'Death' in discworld

True

As a final comment, we consider _mutable_ and _immutable_ objects. Most objects in Python, such as lists, dictionaries, NumPy arrays, most user-defined types (classes), are mutable. This means that the object or values that they contain can be modified.However, other objects, like strings and tuples, are immutable, they cannot be modified.

In [50]:
a_list = ['abc', 2, [3, 4, 5]]
print(a_list)

a_list[2] = (3, 4)
print(a_list)

['abc', 2, [3, 4, 5]]
['abc', 2, (3, 4)]


In [51]:
a_tuple = (3, 5, (4, 5))

a_tuple[1] = 'four'

TypeError: 'tuple' object does not support item assignment

In [52]:
a = 'this is a string'
a[9] = 'f'

TypeError: 'str' object does not support item assignment

In [53]:
b = a.replace('string', 'longer string')    # Bypass the problem of mutability by creating another variable
print(a)
print(b)

this is a string
this is a longer string


Of course, just because we _can_ mutate an object, does not mean we _should_. Such actions are known as _side effects_. For example, when writing a function (coming soon), any side effects should be explicitly communicated to the user in the function's documentation or comments. If possible, try to avoid side effects and favour immutability, even when mutable objects are involved.

__Exercise__
- Write a series of conditional tests. Print a statement describing each test and your prediction for the results of each test. Your code should look something like the following:

` car = toyota`

`print("Is car == 'ford'? I predict False.")`

`print(car == 'ford')`


`print("\nIs car == 'toyota'?  predict True.")`

`print(car == 'toyota')`
- Look closely at the results and make sure you understand why each line evaluates to `True` or `False`.
- Create at least ten tests with at least five evaluating to `True`. Try to include methods like `.lower()`, and include tests with strings, tests with numbers and tests with lists.

## 1.4 Functions and Object Method Calls

Functions are simply named blocks of code which are designed to perform a specific task. We then _call_ the function to actually perform this task. These are useful for blocks of code you expect to have to repeat several times throughout your programme. By containing this code in a function, not only will you save time but will also make your code more readable and easier to fix or alter.

In [54]:
# Simple example

def greet_user():
    """Display a simple greeting."""
    print('Hello, user!')

In [55]:
greet_user()

Hello, user!


In [56]:
# Improve our function

def greet_user2(name):
    """Display a simple greeting with a name."""
    print(f'Hello, {name.title()}!')

In [57]:
greet_user2('john')

Hello, John!


The above examples demonstrate the simplest structure of a function. First, we use the keyword `def` to tell Python we wish to define a function. We then give the function name and, if applicable, the information required for the function to do its job (in the first example, no information was required, while the second example needed a string to be input). This information is fed to the function inside parentheses. Finally (on the first line), we use a colon to signify the end of this step of the process.

Any indented lines which follow `def function_name():` will make up the body of the function. This involves all the code that we want to run when the function is called. The exception is what we enclosed in triple quotes. Inside these is what we call a _docstring_ and it is used to describe what the function does. Python looks for this when it generates documentation for the function.

In [58]:
print(greet_user.__doc__)

Display a simple greeting.


The variable `name` in the definition of `greet_user2` is an example of a _paramter_. As we have said, this is a piece of information that the function needs to do its job. The value (`'john'`) we then fed to `greet_user2` is what we call an argument. An argument is a piece of information that is passed from a function call to a function. When we do so, we assign the argument to the parameter and the function will execute the code.

If we have two or more parameters then we can feed the arguments in the same order as the parameters (i.e. first argument is assigned to first parameter, second argument to second parameter, and so on). These are called _positional arguments_ since their position matters.

An alternative approach is to use _keyword arguments_, where each argument consists of a variable name and a value. Here, we have a name-value pair that we pass to a function. As opposed to positional arguments, this time we directly associate the name and the value within the argument, so when we pass the argument to the fucntion, there is no confusion (with positional arguments, it is easy to mix up the order). In this way, we no longer have to worry about correctly ordering our arguments when we call a function.

In [59]:
# Use positional arguments

def greet_user3(name_1,name_2):
    """Display a simple greeting with two names."""
    print(f'Hello, {name_1.title()} and {name_2.title()}!')

In [60]:
greet_user3('john','vani')

Hello, John and Vani!


In [61]:
# Use keyword arguments

def describe_pet(animal_type, pet_name):
    """Display information about a pet."""
    print(f"\nI have a {animal_type}.")
    print(f"My {animal_type}'s name is {pet_name.title()}.")
          
describe_pet(animal_type='dog', pet_name='emmy')


I have a dog.
My dog's name is Emmy.


The only part that matters is when we call the function. The difference between positional and keyword arguments does not affect the construction of the function in any way.

We can also define _default values_ for each parameter. If an argument for a parameter is provided in the function call, Python uses the argument value. If not, it uses the parameter's default value. So, when we define a default value for a parameter, we can exclude the corresponding argument we would usually write in the function call. Using default values can simplify our function calls, but we must be careful with ordering. Default values should come last as if they did not, we would overwrite the default values with the passed arguments.

In [62]:
# Use default values

def describe_pet(pet_name, animal_type='dog'):
    """Display information about a pet. Default animal type is a dog"""
    print(f"\nI have a {animal_type}.")
    print(f"My {animal_type}'s name is {pet_name.title()}.")
          
describe_pet('emmy')


I have a dog.
My dog's name is Emmy.


In [63]:
describe_pet('Gauss', 'cat')


I have a cat.
My cat's name is Gauss.


Objects in Python typically have both attributes (other Python objects stored 'inside' the object) and methods (functions associated with an object that can have access to the object's internal data). Both of them are accessed via the syntax `obj.attribute_name`.

Try typing `a.<Tab>` for some variable `a`.

Often, we do not care about the type of an object, instead only only caring about whether it has certain methods or behaviour. This is sometimes called _duck typing_ after the saying 'If it walks like a duck, quacks like a duck, then it's a duck.' For example, suppose we want to verify if an object is iterable. For many objects, this means it has a `__iter__` _magic method_, though an alternative and better way to check is to try using the `iter` function.

In [64]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False
    
isiterable('a string')

True

In [65]:
isiterable([1, 2, 3])

True

In [66]:
isiterable(5)

False

This can often be used when writing functions that can accept multiple kinds of input. For example, consider a function that can accept any kind of sequence (`list`, `tuple`, `ndarray`), or even an iterator. We can first check if the object is a list (or a NumPy array) and, if not, convert it to be one.

__Exercises__
- Write a function called _favourite\_book()_ which accepts one parameter, `title`. The function should then print a message saying their favourite book is whatever title is fed into the function. Next, alter your function so that it accepts two parameters, `title` and `author`. Alter your function accordingly.
- Write a function called _describe\_city()_ that accepts the name of a city and its country. The function should then print a simple sentence saying what the city is and which country it is in. Then, give the country parameter a default value. Cal the function a few different times with different cities.
- Explore how to make an argument optional.

## 1.5 Control Flow 

Python has several built-in keywords for conditional logic, loops and other control flow concepts found in other programming languages. The three main keywords are `if`, `for` and `while`. We will briefly discuss each of these.

### 1.5.1 for loops

When working with lists (or other list-type objects), we often want to run through all entires, performing the same task with each item. There are two main loops: `for` loops and `while` loops. Over the next two subsections we will describe each. First, we consider `for` loops. These are useful when we want to do the same action with every item in a list. As might be expected, `for` loops are among the most common ways of automating repetitive tasks.

In [67]:
for i in range(10):
    a = i**2
    print(a)

0
1
4
9
16
25
36
49
64
81


In the above, we have greated a list with 10 elements, starting from 0 and increasing by 1 each time. We use `for` to start the for loop. This tells Python to pull a number from the list and associate it with the variable `i` (we can call it anything we like, apart from reserved names of course). Python then squares this number and prints the output. Python does this until it reaches the end of the list. It might help to read code as, "Square every number in the list of integers from 0 to 9, printing the square as you go."

As with functions, a `for` loop makes use of a colon and indentations to separate the code from the rest of the programme. It is also worth keeping in mind that the set of steps is repeated once for each item in the list, no matter how many items are in that list. If we have a million items, then Python repeats each step one million times. Depending on the code in the step, this is usually quite quick, but this is something which should be kept in mind.

We can iterate over any object that is iterable. For example, we might have a dictionary representing people's names and ages. We can then loop over this.

In [68]:
age_dict = {'tom': 23, 'dick': 75, 'harry': 41}

for name in age_dict.keys():
    print(f"{name.title()} is {age_dict[name]} years old.")

Tom is 23 years old.
Dick is 75 years old.
Harry is 41 years old.


__Exercises:__
- Try creating a `for` loop but 'forget' to indent your code. Similarly, try indenting a piece of your code which does not require indentation. Finally, try 'forgetting' to type the colon when creating a `for` loop.
- Think of at least three different animals that have a common characteristic. Store the names of these animals in a list and then use a `for` loop to print out the name of each animal. Next, modify your programme to also print out a statement about each animal. Finally, add in a line at the end of your programme (not part of the `for` loop) stating what these animals have in common.
- Use a `for` loop to print the numbers from 1 to 20.
- Create a list of the numbers from one to one million and then use a `for` loop to print the numbers. If the output takes too long, stop it by pressing CTRL+C or by closing the output window.

### 1.5.3 While loops

Whereas the `for` loop takes a collection of items and executes a block of code once for each item in that collection, a `while` loop will instead run as long as a certain condition is true. The actual construction is very similar, i.e. we use a colon and indentation as usual. However, for a `while` loop, we just feed it a condition which is checked each time it passes through the loop.

Suppose we want to calculate a sequence of values $2^n$ and carry on going until $2^n>65536$. We can do this as follows:

In [69]:
n = 0

while 2**n <= 65536:
    print (n, 2**n)
    # Increase the value of n by 1
    n+=1

0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10 1024
11 2048
12 4096
13 8192
14 16384
15 32768
16 65536


In general, loops can be quite inefficient. As a general rule of thumb, we should try to avoid loops if possible. We will see some ways to do this later.

__Exercise:__
- Write four loops, two of which should use a `for` loop and two a `while` loop. Could you alter any of your loops to use the other kind of loop (e.g. write your `for` loop using a `while` loop, and vice versa)?

### 1.5.3 if, elif and else

Programming often involves examining a set of conditions and deciding which action to take based on those conditions. Python allows us to examine the current state of a programme and respond appropriately using an `if` statement. We start with a simple example.

In [70]:
discworld = ['Death', 'Rincewind', 'The Luggage', 'The Librarian', 'Granny Weatherwax', 'sam vimes']

for char in discworld:
    if char.lower() == 'death':
        print('Death is probably my favourite character.')
    elif char.lower() == 'sam vimes':
        print('Death may be my favourite character, but Sam Vimes is pretty awesome too.')
    elif char.lower() == 'granny weatherwax':
        print('Death may be my favourite character, but Granny Weatherwax is pretty awesome too.')
    else:
        print(f'{char} is pretty great too, though.')

Death is probably my favourite character.
Rincewind is pretty great too, though.
The Luggage is pretty great too, though.
The Librarian is pretty great too, though.
Death may be my favourite character, but Granny Weatherwax is pretty awesome too.
Death may be my favourite character, but Sam Vimes is pretty awesome too.


The loop in this example first checks if the current value of `char` is 'death' (note the use of `.lower()`). If it is, the first printed message is printed. If it isn't, it next checks if `char` is 'sam vimes', then 'granny weatherwax', and if all this fails, it prints out the last message.

When creating conditional statements, we can use all sorts of logic and comparative code. We could, for example, use AND along with < and > to decide if a number falls within an interval. Or we could use OR to test multiple conditions at once.

Note that Python stops checking once one of the _conditional tests_ come back `True`. In other words, if two conditional statements are true, only the first one will run. Care must be taken as this can lead to errors in our code if we are careless.

In [71]:
a = 4
b = 3

if a < 2:
    print(a+b)
elif a > 2:
    print(a-b)
elif a == 4:
    print(a/b)

1


Note how we can use multiple `elif` blocks and do not need to use `else` at all if we do not want to.

In [72]:
x = 12.5

if x < 0:
    print("It's negative.")
elif x == 0:
    print("It is equal to zero.")
elif 0 < x < 5:
    print("It is positive but smaller than five.")
else:
    print("It is positive and greater than or equal to five.")

It is positive and greater than or equal to five.


__Exercise:__ Make the above into a function.

We can advance a `for` loop to the next iteration, skipping the remainder of the block, using an `if` statement and, importantly, the `continue` keyword. Consider the code which sums up integers in a list and skips `None` values.

In [73]:
sequence = [1, 2, 3, 4, None, 5, 6, 7, None, 8, 9]
total = 0

for value in sequence:
    if value is None:
        continue
    total +=value
    
print(total)

45


A `for` loop can be exited altogether with the `break` keyword. This only terminates the innermost `for` loop, so if we have a `for` loop within a `for` loop, and use `break` for the innermost loop, then the outer loop will continue to run.

In [74]:
sequence = [1, 2, 3, 4, None, 5, 6, 7, None, 8, 9]
total = 0

for value in sequence:
    if value is None:
        continue
    elif value == 5:
        break
    total +=value
    
print(total)

10


In [75]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i,j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


__Exercises:__
- Suppose we are creating a video game in which we shoot down aliens. Create a variable called `alien_colour` and assign it a value of `red`, `white` or `blue`. Write an `if` statement to test whether the alien's colour is `white`. If it is, print a message saying the player just earned 5 points. Then modify your code so that if the alien's colour is blue the code prints a message saying the player just earned 10 points, and if the alien's colour is red then the player earned 20 points. Write three versions of this code, each making different use of an if, elif, else chain.
- Write an if, elif, else chain that determines a person's stage of life. Set a value for the variable age and then print a message saying the person is a baby if they are less than 2 years old, a toddler if they are less than 4 but at least 2 years old, and so on. You should have at least 6 different conditional statements to check.

## 1.6 Magic methods

There are enhancements that IPython adds on top of the normal Python syntax which are known as _magic commands_ and are prefixed by the % character. These commands are designed to succinctly solve various common problems in standard data analysis. They come in two flavours: _line magics_, which are denoted by a single % prefix and operate on a single line of input, and _cell magics_, which are denoted by a doublt %% prefix and operate on multiple lines of input.

### 1.6.1 %run

As we begin to develop more extensive code, we will find ourselves working in IPython for interactive exploration, as well as a text editor to store code that we want to reuse. Rather than running this code in a new window, it  can be convenient to run it within our IPython session. This can be done with the %run magic command. For example, suppose we have created a file called _myscript.py_ with the following contents:

`# file: myscript.py`

`def square(x):`

    `"""square a number"""`

    `return x ** 2`


`for N in range(1, 4):`

    `print(f"{N} squared is {square(N)}")`

In [79]:
%run myscript.py

1 squared is 1
2 squared is 4
3 squared is 9


In [80]:
# After we have run the script, any functions defined within it are available for use in our IPython session

square(5)

25

In [81]:
%run?

There are several options to fine-tune how our code is run. We can see the documentation by typing '`%run?`' in the IPython interpreter.

### 1.6.2 %timeit

This magic function will automatically determine the execution time of the single-line Python statement that follows it. For example, suppose we want to check the performance of a list comprehension:

In [82]:
%timeit L = [n ** 2 for n in range(1000)]

250 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


For short commands `%timeit` will automatically perform multiple runs in order to attain more robust results. For multiline statements, adding a second % sign will then turn this into a cell magic that can handle multiple lines of input. For example, below is the equivalent construction with a `for` loop:

In [83]:
%%timeit
L = []
for n in range(1000):
    L.append(n ** 2)

308 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


We notice that list comprehensions are about 15% faster than the equivalent `for` loop construction in this case.

Like normal Python functions, IPython functions have docstrings and this useful information can be accessed in the standard manner. For example, to read the documentation for %timeit, we simply type '`%timeit?`'. Documentation for other functions can be similarly accessed. To access a general description of avaialble magic functions, including some examples, we can type '`%magic`'. For a quick and simple list of all available magic functions, type '`%lsmagic`'.

As a final note, we can also define our own magic functions, but we will not discuss this here.

__Exercises:__
- Make a list of the numbers from one to one million, and then use `min()` and `max` to make sure the list actually starts at 1 and ends at 1,000,000 Use the `sum()` function to see how quickly Python adds one million numbers. Compare this with a `for` loop that does the same task.
- Read the docstrings for `%run` and `%timeit`.

## 1.7 Dates and times

Python has a built-in `datetime` module which provides `datetime`, `date` and `time` types. The `datetime` type combines the information stored in `date` and `time`, and is the most commonly used.

In [84]:
from datetime import datetime, date, time

dt = datetime(2022, 12, 19, 13, 28, 36)

print(dt.day)
print(dt.minute)

19
28


In [85]:
print('The date is:',dt.date())
print('The time is:',dt.time())

The date is: 2022-12-19
The time is: 13:28:36


In [86]:
# The strftime method formats a datetime as a string

dt.strftime('%d/%m/%Y %H:%M')

'19/12/2022 13:28'

Strings can be converted (parsed) into `datetime` objects with the `strptime` function. A full list of format specifications is as follows:

- '%Y' codes for a four-digit year
- '%y' codes for a two-digit year
- '%m' codes for a two-digit month (i.e. from 01 to 12)
- '%d' codes for a two-digit day (i.e. 01 to 31)
- '%H' codes for a 24-hour clock
- '%I' codes for a 12-hour clock
- '%M' codes for a two-digit minute
- '%S' codes for second (seconds 60, 61 account for leap seconds)
- '%w' codes for weekday as integer (e.g. 0 for Sunday, 1 more Monday etc.)
- '%U' codes for week number of the year (i.e. from 00 to 53; Sunday is considered the first day of the week and days before the first Sunday the year are "Week 0")
- '%W' codes for week number of the year (i.e. from 00 to 53; Monday is considered the first day of the week and days before the first Monday the year are "Week 0")
- '%F' codes for a shortcut for %Y-%m-%d (e.g. 2022-12-19)
- '%D' codes for a shortcut for %m/%d/%y (e.g. 12/18/22)

When we are aggregating or otherwise grouping time series data, it will occasionally be useful to replace time fields of a series of `datetimes`, e.g. replacing the minute and second fields with zero.

In [87]:
dt.replace(minute=0,second=0)

datetime.datetime(2022, 12, 19, 13, 0)

Since `datetime.datetime` is an immutable type, methods like these always produce new objects.

The difference of two `datetime` objects produces a `datetime.timedelta` type.

In [88]:
dt2 = datetime(2023,5,3,9,0)

delta = dt2 - dt
print(delta)
print(type(delta))

134 days, 19:31:24
<class 'datetime.timedelta'>


Adding a `timedelta` to a `datetime` produces a new shifted `datetime`.

In [89]:
dt

datetime.datetime(2022, 12, 19, 13, 28, 36)

In [90]:
dt + delta

datetime.datetime(2023, 5, 3, 9, 0)

## Summary Problems

1. What is 13 to the power of 5?

2. Split the following string into a list:

    `string = "My name is John."`

3. Suppose we are given the variables:

    `planet = "Earth"
    diameter = 12742`

Use the `.format()` method to print the following string:

    `The diameter of Earth is 12742 kilometers.`

4. Given the following nested list, use indexing to grab the word "hello":

`lst = [1,2,[3,4],[5,[6,7,['hello']],8,9],10,42]`

5. Given the following nested dictionary grab the word "hello":
    
`d = {'k1':[1,2,3,{'four':['five','six','seven',{'eight':[9,10,11,'hello']}]}]}`

6. What is the main difference between a tuple and a list?

7. Create a function that will grab the email website domain from a string in the form: `user@domain.com`.

8. Create a function that returns False if the word 'data' is not contained in an input string. If 'data' is contained in the input string, return the number of times it is mentioned.

## Summary Problems - Solutions

What is 13 to the power of 5?

In [91]:
print(13 * 13 * 13 * 13 * 13)   # One option

print(13 ** 5)                  # A better option

371293
371293


2. Split the following string into a list:

    `string = "My name is John."`

In [92]:
string = "My name is John."

In [93]:
string.split()

['My', 'name', 'is', 'John.']

3. Suppose we are given the variables:

    `planet = "Earth"
    diameter = 12742`

Use the `.format()` method to print the following string:

    `The diameter of Earth is 12742 kilometers.`

In [94]:
planet = "Earth"
diameter = 12742

In [95]:
print("The diameter of {a} is {b} kilometers.".format(a=planet, b=diameter))

The diameter of Earth is 12742 kilometers.


4. Given the following nested list, use indexing to grab the word "hello":

`lst = [1,2,[3,4],[5,[6,7,['hello']],8,9],10,42]`

In [96]:
lst = [1,2,[3,4],[5,[6,7,['hello']],8,9],10,42]

In [97]:
lst[3][1][2][0]

'hello'

5. Given the following nested dictionary grab the word "hello":
    
`d = {'k1':[1,2,3,{'four':['five','six','seven',{'eight':[9,10,11,'hello']}]}]}`

In [98]:
d = {'k1':[1,2,3,{'four':['five','six','seven',{'eight':[9,10,11,'hello']}]}]}

In [99]:
d['k1'][3]['four'][3]['eight'][3]

'hello'

6. What is the main difference between a tuple and a list? A tuple is immutable.

7. Create a function that will grab the email website domain from a string in the form: `user@domain.com`.

In [100]:
def returndomain(email):
    return email.split('@')[-1]

In [101]:
returndomain('j.evans8@herts.ac.uk')

'herts.ac.uk'

8. Create a function that returns False if the word 'data' is not contained in an input string. If 'data' is contained in the input string, return the number of times it is mentioned.

In [108]:
def findcountData(st):
    tester = 'data' in st.lower().split()
    
    if tester:
        count = 0
        for word in st.lower().split():
            if word == 'data':
                count +=1
                
        return count
    else:
        return tester

In [109]:
string = 'This is a test string including the word data in which we include data again to see if we can count.'

In [110]:
findcountData(string)

2

In [111]:
string2 = 'Now we include a string without mentioning that word.'

In [112]:
findcountData(string2)

False