BASC0038 Algorithms, Logic and Structure

# Week 1: Introduction to Python and Algorithms

---





# Hello, world!

Python is one of the most popular and widely-used programming languages available for use. Its simplicity yet expressiveness and power makes it perfect not just for beginners, but for scientists through statisticians through software engineers and beyond. As a programming language, it is:


*   **General-purpose**: it can be used to solve a wide variety of tasks across a wide variety of domains.
*   **High-level**: for common use, it obfuscates low-level computing and implementation issues such as memory management.
*   **Interpreted**: the written code can be run directly, as opposed to needing to be **compiled** into an executable program such as an .exe file.
*   **Dynamically typed**: as opposed to **statically typed**, you don't need to explicitly specify the type of a variable (e.g. integer or string), but it is still **strongly typed**, so ill-defined operations on types under the hood will throw errors.

You can use Python to code entire software projects but, as an interpreted language, individual statements can also be run in Python as a command line interface (CLI). This notebook contains code cells such as the one below, which uses the `print` function to output some text:

In [None]:
print("Hello, world!")

Hello, world!


The use of quotation marks represents an alphanumeric *string* value. You can use either single (`'`) or double (`"`) marks for this; unless you're following a particular style guide, it's up to you, but the most important thing is always consistency!

By running individual statements on the command line like this, resultant values are automatically output without needing to use `print`:

In [None]:
4 + 3 * 2

10

Note that proper mathemetical order of operations is honoured, i.e. the multiplication is resolved before the addition.

Python provides all the standard numerical operations you might expect:


*   `+` Addition
*   `-` Subtraction
*   `*` Multiplication
*   `/` Division (floating-point arithmetic, e.g. 5 / 2 = 2.5)
*   `//` Floor division (integer arithmetic, e.g. 5 // 2 = 2)
*   `%` Modulo (division remainder where the sign of the result is the same as the divisor, e.g. 5 % (-2) = -1)
*   `**` Exponentiation (e.g. 2**3 = 8)



# Variables

Just like in mathematics, we can assign values to *variables* which are referred to and manipulated by name. The assignment operator `=` assigns the value on its right to the object on its left:

In [None]:
a = 2
x = a * 5
print(2 * x**2 + 3 * x - 4.5)

225.5


Once initialised, a variable can be reassigned as many times as you wish. Note that the cell above must be run before the one below, otherwise the interpreter won't know anything about `x` needed to calculate the value on the right hand side.

In [None]:
x = x + 6
print(x)

16


Above, we used a variable's current value as part of its new value. Special assignment operators are also available, such that the above can be more simply written:

In [None]:
x += 6
print(x)

22


Variables can be called almost anything you wish, following certain rules:


*   Can only contain alphanumeric characters &ndash; no spaces!
*   Cannot start with a number (but can contain numbers elsewhere).
*   Can contain underscores.
 *    Avoid starting variable names with underscores for now, as this holds special meaning by convention (private class members).





# Comments

Code can be documented in-place through the use of comments. Any line beginning with `#` is ignored by the interpreter as a comment:

In [None]:
# Length of a rectangle (in centimetres)
length = 10

A comment can also be *inline* (you should generally prefer the above, though):

In [None]:
height = 20  # Height of a rectangle (in centimetres)

You *must, must, must* get in the habit of properly commenting your code *right away*. Not only is it immensely helpful for anyone reading your code to understand what it does, it will be helpful for you to understand what your own code does when coming back to it the next month (or day (or hour)).

Comments should describe *what* is done and/or *why* it is done, but should not be used to state the obvious! An example of good commenting might be:

In [None]:
# Calculate the area of the rectangle (in square centimetres)
print(length * height)

200


In contrast, a disgustingly bad comment would be:

In [None]:
# Multiply the length by the height and print the result
print(length * height)

200


Common sense should prevail, but writing well-documented, clean code is an art to be learned. It's highly recommended to check out the [PEP 8 style guide](https://www.python.org/dev/peps/pep-0008/) for common guidance on comment styles, variable naming conventions etc.

# Control flow

## Functions

Speaking of writing clean code, one of the most foundational principles to keep in mind is **DRY: Don't Repeat Yourself!** If you find yourself writing the same piece of code more than once in a single project, a good start would be to write that piece of code as a *function*:

In [None]:
def hello_world():
  print("Hello, world!")

hello_world()
hello_world()
hello_world()

Hello, world!
Hello, world!
Hello, world!


Functions can take *parameters*, allowing you to pass *arguments* to them. A function can also `return` a value upon its completion.

In [None]:
def double(x):
  return x * 2

In [None]:
double(10)

20

In [None]:
double(double(30))

120

## Conditionals (`if`)

Usually, there will be bits of code we only want to execute conditionally, based on some form of predicate. A predicate is a statement returning a Boolean value, i.e. either `True` or `False`. Equality and comparison operators are commonly used for conditional predicates:

In [None]:
print(2 == 3)  # Equality: is equal?
print(3 < 9)  # Less than
print(2 > 10)  # Greater than
print(-1 <= 5)  # Less than or equal to

False
True
False
True


**Warning: Avoid the very common mistake of confusing the assignment operator `=` with the equality operator `==`**

A Boolean value is inverted with the `not` operator:

In [None]:
print(not True)
print(not False)

False
True


Hence, the existence of the inequality operator:

In [None]:
2 != 10

True

The `and` operator only returns `True` if all of its predicates are `True`:

In [None]:
print(2 < 3 and 1 > 19)
print(2 < 3 and 1 > 0)

False
True


The `or` operator returns `True` if at least one of its predicates is `True`:

In [None]:
print(2 < 1 or 1 > 19)
print(2 < 3 or 1 > 19)
print(2 < 3 or 1 > 0)

False
True
True


A code block which should only be executed if a predicate is `True` is created using the `if` statement. An optional `else` block immediately following an `if` block is executed if the `if` block is not executed.

Additionally, the optional `elif` block is shorthand for *otherwise, if...* which allows for chaining multiple, mutually exclusive blocks.

The following is a script with two variables `a` and `b` which prints the larger of the two values, or a surprised exclamation if they are equal!

In [None]:
a = 10
b = 20

if a > b:
  print(a)
elif a < b:
  print(b)
else:
  print("Values are the same!")

20


It would be more prudent to turn this logic into a function taking any two numbers `a` and `b`:

In [None]:
def maximum(a, b):
  if a > b:
    return a
  else:
    return b


maximum(10, 20)

20

Notice that the `else` statement is unnecessary in this case due to the `return` statements. That is, if `a > b`, the function returns anyway, so nothing afterwards will be evaluated; therefore, the above is functionally identical to:

In [None]:
def maximum_2(a, b):
  if a > b:
    return a
  return b


maximum_2(10, 20)

20

For a simple, one-line conditional such as this, a *conditional expression* of the form `value_if_true if predicate else value_if_false` can be used instead. This is also often known as the *ternary operator* (although this technically refers to *any* operator taking 3 arguments).

In [None]:
def maximum_3(a, b):
  return a if a > b else b


maximum_3(10, 20)

20

Of course, doing any of this would be entirely unnecessary as `max` is already a built-in function in Python taking as many arguments as you like. It's important to understand how things work under the hood, but equally important to not reinvent the wheel!

In [None]:
max(2, 10, 20, 15)

20

## Loops (`while`; `for`)

Another common task in programming is to repeat a statement or block of statements multiple times. A `while` block continually repeats as long as a predicate remains `True`, terminating and moving on with the script once it becomes `False`:

In [None]:
x = 0

while x < 10:
  print(x)
  x += 1

print("Done!")

0
1
2
3
4
5
6
7
8
9
Done!


Generally, `for` loops are used for more sophisticated iteration. We will look at collections of values in the next section, but concisely, a `for` loop allows you to perform an operation on each value in a collection:

In [None]:
for value in range(10):
  print(value)

print("Done!")

0
1
2
3
4
5
6
7
8
9
Done!


# Collections

## Lists

In [None]:
data = list(range(100))
print(data)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


## ✍️ Exercise: Generating lists

Using list comprehension, or otherwise, generate a list named `data` containing the first 500 values of $f(x)$ (that is, starting with $x=0,x=1$ etc.) where
$$f(x) = \begin{cases}x^2, & x<10,\\x+100, & x \geq 10. \end{cases}$$


<h2>👇</h2>

In [None]:
data = [x**2 if x < 10 else x + 100 for x in range(500)]

🟢

In [None]:
# Output should be:
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 110, 111, 112, ... , 598, 599]
# 500

print(data)
print(len(data))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302

# Search algorithms

## Linear search

## ✍️ Exercise: Linear search

<h2>👇</h2>

In [None]:
def linear_search_iterative(array, value):
  """Query if a value is in an array via iterative linear search.

  Args:
    array: List of elements to query.
    value: Value to query presence of.
  
  Returns:
    True if value is in array, False otherwise.

  """
  for elt in array:
    if elt == value:
      return True
  
  return False

🟢

In [None]:
# Output should be:
# (True, True, True, False)

(linear_search_iterative(data, 81),
 linear_search_iterative(data, 0),
 linear_search_iterative(data, 599),
 linear_search_iterative(data, -2))

(True, True, True, False)

## Binary search

### Recursive approach

### ✍️ Exercise: Recursive binary search

<h2>👇</h2>

In [None]:
def binary_search_recursive(array, value):
  """Query if a value is in an array via recursive binary search.

  Args:
    array: List of elements to query.
    value: Value to query presence of.
  
  Returns:
    True if value is in array, False otherwise.
    
  """
  # Base cases for empty or singular list
  n = len(array)
  if n == 0:
    return False
  elif n == 1:
    return array[0] == value

  # Recursive case
  middle = n // 2
  if array[middle] == value:
    return True  
  elif array[middle] < value:
    return binary_search_recursive(array[middle + 1:], value)
  else:
    return binary_search_recursive(array[:middle], value)

🟢

In [None]:
# Output should be:
# (True, True, True, False)

(binary_search_recursive(data, 81),
 binary_search_recursive(data, 0),
 binary_search_recursive(data, 599),
 binary_search_recursive(data, -2))

(True, True, True, False)

### Iterative approach

### ✍️ Exercise: Iterative binary search

<h2>👇</h2>

In [None]:
def binary_search_iterative(array, value):
  """Query if a value is in an array via iterative binary search.

  Args:
    array: List of elements to query.
    value: Value to query presence of.
  
  Returns:
    True if value is in array, False otherwise.

  """  
  # Iteration terminates when (min, max) range has shrunk such that min > max
  min = 0
  max = len(array) - 1
  while min <= max:
    middle = (min + max) // 2
    if array[middle] == value:
      return True
    elif array[middle] < value:
      min = middle + 1
    else:
      max = middle - 1
  
  return False

🟢

In [None]:
# Output should be:
# (True, True, True, False)

(binary_search_iterative(data, 81),
 binary_search_iterative(data, 0),
 binary_search_iterative(data, 599),
 binary_search_iterative(data, -2))

(True, True, True, False)

## ✍️ Exercise: Recursive linear search
We have seen that binary search can be implemented recursively or iteratively. In general, any problem that can be solved recursively can be solved iteratively, and vice versa (see *Turing completeness*).

Write a recursive implementation of linear search as `linear_search_recursive`. Hint: linear search checks the first element, before moving onto 'the next'.


<h2>👇</h2>

In [None]:
def linear_search_recursive(array, value):
  """Query if a value is in an array via recursive linear search.

  Args:
    array: List of elements to query.
    value: Value to query presence of.
  
  Returns:
    True if value is in array, False otherwise.
    
  """
  # Base case for empty list
  n = len(array)
  if n == 0:
    return False

  # Recursive case
  if array[0] == value:
    return True
  else:
    return linear_search_recursive(array[1:], value)

🟢

In [None]:
# Output should be:
# (True, True, True, False)

(linear_search_recursive(data, 81),
 linear_search_recursive(data, 0),
 linear_search_recursive(data, 599),
 linear_search_recursive(data, -2))

(True, True, True, False)

# Complexity analysis

## Big O notation

Take a function $f(x)$ such as

$$f(x) = 2x^2 + 3x + 4$$

This is a quadratic function with 3 terms: a quadratic, linear and constant term. Of these, the quadratic term is the fastest-growing; that is, **for sufficiently large $x$** (i.e. $x \to \infty$) the $2x^2$ term will dwarf the others. Asymptotically, the quadratic term is sufficient to characterise the growth of the function, so we say that the *order* of the function is quadratic, written as

$$f(x) \in O(x^2)$$

or simply

$$f(x) = O(x^2)$$

The big O notation indicates an *upper bound* on the growth of the function $f(x)$ &ndash; here, we are stating that $f(x)$ *grows no faster* than any constant multiple of $x^2$. Given a function, it is usually straightforward to state its order by inspection, by simply citing only the fastest-growing term ignoring constant factors (the $2$ is dropped).

As an upper bound, it is also correct to say that $f(x)=O(x^3)$, $f(x)=O(x^{42})$, $f(x)=O(e^{e^x})$ etc., but naturally this is not particularly informative, as it is the minimisation of the upper bound which is of interest.


## Linear search



1.   Decide the *elementary operation* for the algorithm. Here, it is a comparison between two values.
2.   Decide on a *case* to analyse. Here, we are interested in the *worst case*, i.e. the maximum number of comparisons.
3.    Form an expression $T(n)$ for the number of elementary operations in terms of the input size $n$.
4.    Report the complexity as the order (big O) of $T(n)$.

The worst case of linear search is if the element to be found is in the very last position, or not in the list at all &ndash; both cases maximise the number of comparisons performed.

The analysis for linear search is trivial: in the worst case, $n$ comparisons are performed:

\begin{align}
T(n) &= n \\
\therefore \quad T(n) &= O(n)
\end{align}

In conclusion, we can say that this algorithm runs in *linear time* (hence its name) in the worst case.

The best case is the search value being in the very first position, so only a single comparison will ever be performed, i.e. $T(n)=1=O(1)$, described as *constant-time*. The average case is the search value being in the middle of the list, so

\begin{align}
T(n) &= \frac{n}{2} \\
\therefore \quad T(n) &= O(n)
\end{align}

So, the best case is trivially constant-time, and the worst-case and average-case complexities are of the same order. Both facts are true for a lot of algorithms... but not always, as we will see!


## Binary search

The analysis is not so trivial for binary search. We begin, as always, by selecting an elementary operation (comparisons) and forming an expression to count the number of operations for an input of size $n$. 

A step (iteration/recursion) of binary search on a set of size $n$ comprises a single comparison, followed by the same step on a set of size $n/2$:

\begin{equation}
T(n) = 1+T{\left(\frac{n}{2}\right)}
\end{equation}

This forms a recurrence relation, which in this case is straightforward to resolve by inspection. Here, we expand it twice (substitute it into itself), then simplify it by introducing a parameter $k$ as the number of steps (a technique known as *telescoping*):

\begin{align}
T(n) &= 1+1+T{\left(\frac{n}{4}\right)} \\
&= 1+1+1+T{\left(\frac{n}{8}\right)} \\
&\;\;\vdots \\
&= k+T{\left(\frac{n}{2^k}\right)} \\
\end{align}

The base case is a set of size 1, achieved after halving the set $k$ times:

\begin{align}
\frac{n}{2^k} &= 1 \\
\implies \quad k &= \log_2{n}
\end{align}

Substituting this back in gives

\begin{align}
T(n) &= \log_2{n} + T(1)
\end{align}

A set of size 1 requires 1 comparison to search, so $T(1)=1$:

\begin{align}
T(n) &= \log_2{n} + 1\\
\therefore \quad T(n) &= O(\log{n})
\end{align}

Therefore, we can say that binary search runs in *logarithmic time* in the worst case: a definite improvement over linear time. It is common to leave the base out of the logarithm when reporting bounds, as logarithms of different bases are interchangeable by a constant factor.

NB: As we are dealing with integer values which may not necessarily be powers of 2, the formula should more correctly be $T(n)=\lfloor\log_2{n}+1\rfloor$. This is simply because there are always $\lfloor\log_2{n}+1\rfloor$ levels in a binary tree with $n$ nodes, and one comparison is performed for each level. This is the *floor* function, simply representing rounding down to the nearest integer. For example, for $n=8$, there are $\lfloor\log_2(8)+1\rfloor=4$ levels, and one comparison is performed at each level:

<img src="https://drive.google.com/uc?export=view&id=1Dcx-WpUCD3crrpuUJ8P8B3qFRYbFPF5a" width="30%"/>


The best case of binary search is, again, the element being in the first position queried, which this time is the middle position, yielding $O(1)$. Derivation of the average-case complexity for binary search is non-trivial and out of scope, but rest assured that it is $O(\log{n})$ as with the worst case.

Finally, note that we can also solve the complexity of linear search in its recursive form by telescoping, showing that its iterative and recursive forms are equivalent:

\begin{align}
T(n) &= 1 + T(n-1) \\
&= 1 + 1 + T(n-2) \\
&= 1 + 1 + 1 + T(n-3) \\
&\;\;\vdots \\
&= k+T(n-k) \\
n-k &= 1 \\
\implies \quad k &= n-1 \\
\implies \quad T(n) &= n - 1 + T(1) \\
&= n-1+1 \\
&= n
\end{align}

# Code profiling

## ✍️ Exercise: Timing experiments

Solution repeating large task for few n runs, reporting minimum:

In [None]:
import time


def time_function(function, n):
  times = [None] * n
  for i in range(n):
    t0 = time.perf_counter()
    function()
    t1 = time.perf_counter()
    times[i] = t1 - t0
  return min(times)


big_data = list(range(100000000))
n = 10
print("Linear search:             "
      + str(time_function(lambda: linear_search_iterative(data, len(big_data) - 1), n))
      + " s")
print("Binary search (recursive): "
      + str(time_function(lambda: binary_search_recursive(data, len(big_data) - 1), n))
      + " s")
print("Binary search (iterative): "
      + str(time_function(lambda: binary_search_iterative(data, len(big_data) - 1), n))
      + " s")

Linear search:             1.3048999790044036e-05 s
Binary search (recursive): 4.393999915919267e-06 s
Binary search (iterative): 2.0249999579391442e-06 s


Solution repeating medium task for many n runs, reporting total:

In [None]:
import timeit

big_data = list(range(10000))
print("Linear search:             "
      + str(timeit.timeit(lambda: linear_search_iterative(data, len(big_data) - 1)))
      + " s")
print("Binary search (recursive): "
      + str(timeit.timeit(lambda: binary_search_recursive(data, len(big_data) - 1)))
      + " s")
print("Binary search (iterative): "
      + str(timeit.timeit(lambda: binary_search_iterative(data, len(big_data) - 1)))
      + " s")

Linear search:             14.328640144998644 s
Binary search (recursive): 4.219241082000735 s
Binary search (iterative): 1.9989574250012083 s


# ➕ Extra: Self-organising lists