<a href="https://colab.research.google.com/github/hr-ge/Python-for-clinicians/blob/main/week1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1: "Hello, World!", NumPy arrays, and more

The purpose of this notebook is to introduce the reader to basic programming concepts through Python (https://www.python.org/), a high-level, general-purpose programming language. We will learn how to do basic algebra using Python and explore fundamental building blocks of programs such as variables, loops, and if-statements. We will also explore one of Python's most important libraries for data science - NumPy (https://numpy.org/).

# Python as a calculator


In [None]:
2 + 1

3

In [None]:
2 - 1

1

In [None]:
3 * 2

6

In [None]:
4 / 2

2.0

In [None]:
5 % 2 # This is the modulus operator and this is a comment.

1

In [None]:
4 ** 2

16

# Everyone's first program: "Hello, World!"

It is a decades-long tradition for beginner programmers like yourself to print out a "Hello, World!" messege to the console as their first program. It is a simple, yet effective way of getting a taste of what programming is all about. (https://medium.com/the-software-guild-blog/the-history-of-hello-world-175440f77776)

In [None]:
print("Hello, World!")

Hello, World!


Although simple, there is already a lot going on with this program. First of all, we have the ```print()``` function, which is one of Python's built-in functions (https://docs.python.org/3/library/functions.html). What we pass within the paranthesis of the ```print()``` function is known as an *argument*. This particular argument's *type* is what we call a *String*.

**Takeaway messege:** if you want to display a piece of text on the screen, put it within single or double quotation marks ('Hello, World!' or "Hello, World!") and pass it to the ```print()``` function.

# Variables

One's understanding of variables as a mathematical concept is almost fully applicable to programming. In mathematics, a variable is some symbol that referes to a value that can change. In programming, variables refer to a memory location that stores some value that can be changes. In computer science, we say that we *assign* some value to some variable. Note that ```=``` is an assignment operator and does not stand for equality!

In [None]:
x = 1
y = 2
print("x + y = ", x + y)

x = 3
y = 4
print("x + y = ", x + y)

x + y =  3
x + y =  7


Notice that Python code is executed line-by-line and that print statements can be a lot more complicated than what we previously saw.

In [7]:
pi = 3.14

print("Memory address of pi:", hex(id(pi)))

Memory address of pi: 0x789244475b70


# Exercise 1

You are given 5 minutes to write a small script that prints out your name and age. Note that although your name is permanent, your age is not. Store your name and age in variables. You are allowed to use as many ```print()``` statements as you want but bonus points are given if you use as few as possible.



Hello, my name is Hristo and I am 24 years old.


# Lists

Data structures are essential in programming, as they allow us to organize data in an efficient manner. Our first taste of data structures comes in the face of *lists*. But what is a list? Essentially, a list is a data structure that contains a *n* number of elements, for example, strings, integers, etc. Here are some important properties of Python lists:


*   Lists are indexible on the range [0, n-1].
*   Lists are dynamic, meaning we can grow and shrink their size be adding and removing elements.
*   Lists are mutable, meaning we can change the value stored at any given index.



In [None]:
ints   = [1, 2, 3]
floats = [1.61, 2.71, 3.14]
words  = ["Denmark", "Copenhagen", "hospital"]

In [None]:
print(ints)
print(floats)
print(words)

[1, 2, 3]
[1.61, 2.71, 3.14]
['Denmark', 'Copenhagen', 'hospital']


What if you want a single element from a given list? This is known as indexing.

In [None]:
print(ints[0])      # Access the first element in the ints list.
print(floats[1])    # Access the second element in the floats list.
print(words[2])     # Access the third element in the words list.

1
2.71
hospital


What if you want multiple elements from a given list that appear in sequence? This is known as slicing.

In [None]:
print(ints[0:2])

[1, 2]


In [None]:
print("Original: ", ints)
ints.append(4)
print("Appended: ", ints)
ints.remove(1)
print("Removed:  ", ints)

Original:  [1, 2, 3]
Appended:  [1, 2, 3, 4]
Removed:   [2, 3, 4]


In [None]:
ints[2] = 5
print("Mutated: ", ints)

Mutated:  [2, 3, 5]


**Note:** in reality, Python lists are a type of dynamic array. This is not you should necessarily worry about at this stage. However, consider the following video if you would like to learn more about dynamic arrays: https://www.youtube.com/watch?v=PEnFFiQe1pM&list=PLDV1Zeh2NRsB6SWUrDFW2RmDotAfPbeHu&index=5

Lists can also be multi-dimensional.

In [None]:
list2d = [[0, 1, 2], [3, 4, 5]]
list3d = [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]

Indexing multi-dimensional lists is conceptually the same as indexing a simple 1D list.

In [None]:
print(list2d[0][1])
print(list3d[1][0][1])

1
7


# For-loops

For-loops are defined using the ```for``` keyword. They require a sequence to loop through and a properly indented body that defines what to do at each step.

In [None]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


```range()``` is another built-in function that takes an integer as its argument and returns an immutable sequence type. ```i``` is a variable that holds a different value at each step of the for-loop.

We saw above that when we use the ```print()``` function to display the contents of a list, it will print all of them together on the same line. What if we intend a different behavior? What if we want the contents printed one-by-one?

In [None]:
print("Start looping...")

for word in words:
    print("----------")
    print(word)
    print("----------")

print("...stop looping!")

Start looping...
----------
Denmark
----------
----------
Copenhagen
----------
----------
hospital
----------
...stop looping!


Indentation in Python matters! Notice that the lines that are indented after we define the for-loop are executed at each itterative step. On the other hand, the non-indented lines before and after the for-loop definitation are executed only once.

# The bool type and the if-statement

Logic is essential for programming as it allows us evaluate wether a given statement is true or false. ```True``` and ```False``` are values of type ```bool```, named after the English mathematatician George Boole, a pioneer in the field of logic. The standard operations of  ```and```, ```or```, and ```not``` can be used to manipulate the boolean values.



In [None]:
isTrue  = True
isFalse = False

In [None]:
print(isTrue and isTrue)
print(isTrue and isFalse)
print(isFalse and isTrue)
print(isFalse and isFalse)

True
False
False
False


In [None]:
print(isTrue or isTrue)
print(isTrue or isFalse)
print(isFalse or isTrue)
print(isFalse or isFalse)

True
True
True
False


In [None]:
print(not isTrue)
print(not isFalse)

False
True


We can check if two boolean values are the same by using the equality operator ```==```.



In [None]:
print(isTrue == isTrue)
print(isTrue == isFalse)
print(isFalse == isTrue)
print(isFalse == isFalse)

True
False
False
True


The reason why we care about these values and operations is because they allow us to construct one of the corner stones of programming - the *if-statement*. The importance of the if-statement lays in that it allows us to control the flow of programs based on certain conditions.

In [None]:
if isTrue == True:
    print("This is true!")

This is true!


In [None]:
if isFalse == True:
    print("This will not print!")

# Worked example

Now lets put everything we have learned so far into a worked example. For this purpose, we will have a list of integer values from 0 to 9. We will loop through the list and check if the current value ```i``` is even. If yes, then we will simply print out "Even.". Otherwise, we will print out "Odd."



In [None]:
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "10"]

In [None]:
for n in nums:
    if type(n) is int:
        if n % 2 == 0:
            print("Even.")
        elif n % 2 == 1:
            print("Odd.")
    else:
        print("n must be of type integer.")

Even.
Odd.
Even.
Odd.
Even.
Odd.
Even.
Odd.
Even.
Odd.
n must be of type integer.


# Exercise 2

You are a given two Python lists of x values (```xs```) and y values (```ys```). Your goal is to produce a Python list of z values (```zs```), where each z is the element-wise sum of ```xs``` and ```ys```. In theory, this is what the code below does. In practice, there are 3 mistakes that make the program incompatible. Copy the code in code in your own notebook and try to find all 3 mistakes. The resulting print should look like ```[5, 7, 9, 11, 13]```.

In [None]:
xs = [0, 1, 2, 3, 4]
ys = [5, 6, 7, 8, 9]
zs = []

for j in range(0, 5):
    x = xs[i]
    y = ys[i]
    z == x + y
zs.append(z)

print(zs)

[5, 7, 9, 11, 13]


# Functions

Finally, lets take a look at functions as a programming concept. In simple terms, a Python function is a named piece of code designed to carry out a specific task. Generally speaking, programmers are encoureged to keep functions simple and dedicated, i.e. a function should perform one task and one task only. Functions are an amazing way to improve readability as they encourage code reusability.

In [None]:
def isEven(n):
    """
    This is a docstring.

    Determine whether a given number is even.

    Arguments:
    n (int): The number to check.

    Returns:
    bool or None: True if the number is even, False if it is odd,
                  and None if the input is not an integer.
    """
    isEven = None

    if type(n) is int:
        if n % 2 == 0:
            isEven = True
        else:
            isEven = False
    else:
        isEven = None

    return isEven


In [None]:
for n in nums:
    print(isEven(n))

# Recursion

Recursive function is one that calls itself until a base case is met. The base case serves as termination and is necessary for the function to cease to run.

In [1]:
def fibonacci(n):
    """
    Recursively generate Fibonacci numbers up to n.

    Arguments:
        n (int): The number of Fibonacci numbers to generate.

    Returns:
        list: A list of Fibonacci numbers.
    """
    # Base cases
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]

    # Recursive case
    fib_sequence = fibonacci(n - 1)
    fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])

    return fib_sequence

In [6]:
print(fibonacci(100))

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041, 1548008755920, 2504730781961, 4052739537881, 6557470319842, 10610209857723, 17167680177565, 27777890035288, 44945570212853, 72723460248141, 117669030460994, 190392490709135, 308061521170129, 498454011879264, 806515533049393, 1304969544928657, 2111485077978050, 3416454622906707, 5527939700884757, 8944394323791464, 14472334024676221, 23416728348467685, 37889062373143906, 61305790721611591, 99194853094755497, 160500643816367088, 259695496911122585, 420196140727489673, 679891637638612258, 110008777

# Scientific computing with NumPy

NumPy (https://numpy.org/) is a core library for scientific computing in Python. It is essential for other commonly used libraries such as Pandas, Scikit-Learn, and SciPy among many others. NumPy owes its popularity to its efficient implementation of n-dimensional arrays that allows for seamless linear algebra operations.

In [None]:
import numpy as np

In [None]:
array1d = np.array([0, 1, 2])
array2d = np.array([[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]])
array3d = np.array([[[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]],
                    [[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]],
                    [[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]]])

In [None]:
print(array1d)
print("----------")
print(array2d)
print("----------")
print(array3d)

# Indexing and slicing a NumPy array

Indexing and slicing NumPy arrays is identical to indexing and slicing Python lists.

In [None]:
print(array1d[0])
print("----------")
print(array2d[1][2])
print("----------")
print(array3d[2][1][0])

In [None]:
print(array1d[0:2])
print("----------")
print(array2d[0:2])
print("----------")
print(array3d[1][0:2])

# NumPy array properties

Now we will take a closer look at some useful NumPy array properties. We will focus on ```array2d```, as it is complex and interesting, yet intuituive to work with.



In [None]:
print("The shape of array3d is: ", array2d.shape)
print("The dimensionality of array3d is: ", array2d.ndim)
print("The size of array3d is: ", array2d.size)
print("The type of data stored in array3d is: ", array2d.dtype)

We can manipulate the shape of the array through the ```.shape()``` function.

In [None]:
array2d_reshaped = np.reshape(array2d, (1, 9))

In [None]:
print("The shape of array2d_reshaped is: ", array2d_reshaped.shape)
print("The dimensionality of array2d_reshaped is: ", array2d_reshaped.ndim)
print("The size of array2d_reshaped is: ", array2d_reshaped.size)
print("The type of data stored in array2d_reshaped is: ", array2d_reshaped.dtype)

It is perhaps a bit counter-intuitive that the reshaped array is still 2D. Lets see why this is the case.

In [None]:
print(array2d_reshaped)

What if we want it to be 1D?

In [None]:
array2d_flattened = array2d.flatten()

In [None]:
print("The shape of array2d_flattened is: ", array2d_flattened.shape)
print("The dimensionality of array2d_flattened is: ", array2d_flattened.ndim)
print("The size of array2d_flattened is: ", array2d_flattened.size)
print("The type of data stored in array2d_flattened is: ", array2d_flattened.dtype)

How are ```array2d_reshaped``` and ```array2d_flattened``` different?



In [None]:
print("The first element of array2d_reshaped: ", array2d_reshaped[0])
print("The first element of array2d_flattened: ", array2d_flattened[0])

# Element-wise operations in NumPy

NumPy allows for the same algebraic operations as Python does. We shall take a look at those by the means of two 2D arrays.

In [None]:
m1 = np.array([[0, 1],
               [2, 3]])

m2 = np.array([[4, 5],
               [6, 7]])

In [None]:
m3_add = m1 + m2
m3_sub = m2 - m1
m3_mul = m1 * m2
m3_div = m3_mul / m2
m3_pow = m1 ** 2

In [None]:
print(m3_add)

In [None]:
print(m3_sub)

In [None]:
print(m3_mul)

In [None]:
print(m3_div)

In [None]:
print(m3_pow)

NumPy also offers a range of useful aggregation functions.

In [None]:
print("The sum of m1: ", np.sum(m1))
print("The mean of m1: ", np.mean(m1))
print("The std of m1: ", np.std(m1))
print("The min of m1: ", np.min(m1))
print("The max of m1: ", np.max(m1))

# Working with matrices

Last but not least, NumPy allows for efficient matrix operations, such as matrix transposition.

In [None]:
m1_transpose = m1.T

In [None]:
print(m1_transpose)

Matrix multiplication.

In [None]:
mat_mul = m1 @ m2
mat_dot = np.dot(m1, m2)

In [None]:
print(mat_mul)
print("----------")
print(mat_dot)

Matrix inversion.

In [None]:
m1_inv = np.linalg.inv(m1)

In [None]:
print(m1_inv)

Determinant calculation.

In [None]:
m2_det = np.linalg.det(m2)

In [None]:
print(m2_det)

Ocasionally, we know the size and dimensionality of the matrix we wish to arrive at, but we are yet to figure out its values. In those cases, computer scientists initialize a dummy matrix that contains the same value everywhere. NumPy offers functionality to do that for us.

In [None]:
array_zeros = np.zeros((3,3))
array_ones  = np.ones((3,3))

In [None]:
print(array_zeros)
print("----------")
print(array_ones)

Such occasion might be if we wanted to implement one of the element-wise operations from scratch. We know that the final matrix should have 4 elements and be 2 $\times$ 2 but we do not know the values yet.

In [None]:
m3 = np.zeros((2,2))

for i in range(m1.shape[0]):
    for j in range(m1.shape[1]):
        m3[i][j] = m1[i][j] * m2[i][j]

print(m3)

# Why should I care about NumPy arrays when we already have Python lists?

Because NumPy arrays are faster. Lets look at a simple experiment. We will start by importing the ```time``` library. Note that all the details about ```time``` are not important for this course.



In [None]:
import time

We will then define a Python list and a NumPy array of the same shape (1D) and size.

In [None]:
size     = 10**5
py_list  = list(range(size))
np_array = np.arange(size)

We will then record the current time at executing the code cell, iterate over the Python list and multiply each of its elements by 2, and finally, get the total runtime by subtracting the starting time from the time of finishing the operation.

In [None]:
start_time     = time.time()
py_list_result = [x * 2 for x in py_list] # This is a list comperhenssion.
py_list_time   = time.time() - start_time

Do the same for the NumPy array.

In [None]:
start_time      = time.time()
np_array_result = np_array * 2
np_time         = time.time() - start_time

Simply compare the results.

In [None]:
print(f"Python list time: {py_list_time:.6f} seconds")
print(f"NumPy array time: {np_time:.6f} seconds")
print(f"Speedup: {py_list_time / np_time:.2f}x faster with NumPy")