<a href="https://colab.research.google.com/github/hr-ge/Python-for-clinicians/blob/main/week1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1: "Hello, World!", NumPy arrays, and more

The purpose of this notebook is to introduce the reader to basic programming concepts through Python (https://www.python.org/), a high-level, general-purpose programming language. We will learn how to do basic algebra using Python and explore fundamental building blocks of programs such as variables, loops, and if-statements. We will also explore one of Python's most important libraries for data science - NumPy (https://numpy.org/).

# Python as a calculator


In [1]:
2 + 1

3

In [2]:
2 - 1

1

In [3]:
3 * 2

6

In [4]:
4 / 2

2.0

In [5]:
5 % 2 # This is the modulus operator and this is a comment.

1

In [6]:
4 ** 2

16

# Everyone's first program: "Hello, World!"

It is a decades-long tradition for beginner programmers like yourself to print out a "Hello, World!" messege to the console as their first program. It is a simple, yet effective way of getting a taste of what programming is all about. (https://medium.com/the-software-guild-blog/the-history-of-hello-world-175440f77776)

In [7]:
print("Hello, World!")

Hello, World!


Although simple, there is already a lot going on with this program. First of all, we have the ```print()``` function, which is one of Python's built-in functions (https://docs.python.org/3/library/functions.html). What we pass within the paranthesis of the ```print()``` function is known as an *argument*. This particular argument's *type* is what we call a *String*. Strings are essentially a *list* of *characters*, where lists and characters are types on their own.

**Takeaway messege:** if you want to display a piece of text on the screen, put it within single or double quotation marks ('Hello, World!' or "Hello, World!") and pass it to the ```print()``` function.

# Variables

One's understanding of variables as a mathematical concept is almost fully applicable to programming. In mathematics, a variable is some symbol that referes to a value that can change. In programming, variables refer to a memory location that stores some value that can be changes. In computer science, we say that we *assign* some value to some variable. Note that ```=``` is an assignment operator and does not stand for equality!

In [8]:
x = 1
y = 2
print("x + y = ", x + y)

x = 3
y = 4
print("x + y = ", x + y)

x + y =  3
x + y =  7


Notice that Python code is executed line-by-line and that print statements can be a lot more complicated than what we previously saw.

# Lists

Data structures are essential in programming, as they allow us to organize data in an efficient manner. Our first taste of data structures comes in the face of *lists*. But what is a list? Essentially, a list is a data structure that contains a *n* number of elements, for example, strings, integers, etc. Here are some important properties of Python lists:


*   Lists are indexible on the range [0, n-1].
*   Lists are dynamic, meaning we can grow and shrink their size be adding and removing elements.
*   Lists are mutable, meaning we can change the value stored at any given index.



In [9]:
ints   = [1, 2, 3]
floats = [1.61, 2.71, 3.14]
words  = ["Denmark", "Copenhagen", "hospital"]

In [10]:
print(ints)
print(floats)
print(words)

[1, 2, 3]
[1.61, 2.71, 3.14]
['Denmark', 'Copenhagen', 'hospital']


What if you want a single element from a given list? This is known as indexing.

In [11]:
print(ints[0])      # Access the first element in the ints list.
print(floats[1])    # Access the second element in the floats list.
print(words[2])     # Access the third element in the words list.

1
2.71
hospital


What if you want multiple elements from a given list that appear in sequence? This is known as slicing.

In [12]:
print(ints[0:2])

[1, 2]


In [13]:
print("Original: ", ints)
ints.append(4)
print("Appended: ", ints)
ints.remove(1)
print("Removed:  ", ints)

Original:  [1, 2, 3]
Appended:  [1, 2, 3, 4]
Removed:   [2, 3, 4]


In [14]:
ints[2] = 5
print("Mutated: ", ints)

Mutated:  [2, 3, 5]


**Note:** in reality, Python lists are a type of dynamic array. This is not you should necessarily worry about at this stage. However, consider the following video if you would like to learn more about dynamic arrays: https://www.youtube.com/watch?v=PEnFFiQe1pM&list=PLDV1Zeh2NRsB6SWUrDFW2RmDotAfPbeHu&index=5

Lists can also be multi-dimensional.

In [15]:
list2d = [[0, 1, 2], [3, 4, 5]]
list3d = [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]

Indexing multi-dimensional lists is conceptually the same as indexing a simple 1D list.

In [16]:
print(list2d[0][1])
print(list3d[1][0][1])

1
7


# For-loops

For-loops are defined using the ```for``` keyword. They require a sequence to loop through and a properly indented body that defines what to do at each step.

In [17]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


```range()``` is another built-in function that takes an integer as its argument and returns an immutable sequence type. ```i``` is a variable that holds a different value at each step of the for-loop.

We saw above that when we use the ```print()``` function to display the contents of a list, it will print all of them together on the same line. What if we intend a different behavior? What if we want the contents printed one-by-one?

In [18]:
print("Start looping...")

for word in words:
    print("----------")
    print(word)
    print("----------")

print("...stop looping!")

Start looping...
----------
Denmark
----------
----------
Copenhagen
----------
----------
hospital
----------
...stop looping!


Indentation in Python matters! Notice that the lines that are indented after we define the for-loop are executed at each itterative step. On the other hand, the non-indented lines before and after the for-loop definitation are executed only once.

# The bool type and the if-statement

Logic is essential for programming as it allows us evaluate wether a given statement is true or false. ```True``` and ```False``` are values of type ```bool```, named after the English mathematatician George Boole, a pioneer in the field of logic. The standard operations of  ```and```, ```or```, and ```not``` can be used to manipulate the boolean values.



In [19]:
isTrue  = True
isFalse = False

In [20]:
print(isTrue and isTrue)
print(isTrue and isFalse)
print(isFalse and isTrue)
print(isFalse and isFalse)

True
False
False
False


In [21]:
print(isTrue or isTrue)
print(isTrue or isFalse)
print(isFalse or isTrue)
print(isFalse or isFalse)

True
True
True
False


In [22]:
print(not isTrue)
print(not isFalse)

False
True


We can check if two boolean values are the same by using the equality operator ```==```.



In [23]:
print(isTrue == isTrue)
print(isTrue == isFalse)
print(isFalse == isTrue)
print(isFalse == isFalse)

True
False
False
True


The reason why we care about these values and operations is because they allow us to construct one of the corner stones of programming - the *if-statement*. The importance of the if-statement lays in that it allows us to control the flow of programs based on certain conditions.

In [24]:
if isTrue == True:
    print("This is true!")

This is true!


In [25]:
if isFalse == True:
    print("This will not print!")

# Worked example

Now lets put everything we have learned so far into a worked example. For this purpose, we will have a list of integer values from 0 to 9. We will loop through the list and check if the current value ```i``` is even. If yes, then we will simply print out "Even.". Otherwise, we will print out "Odd."



In [26]:
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "10"]

In [27]:
for n in nums:
    if type(n) is int:
        if n % 2 == 0:
            print("Even.")
        elif n % 2 == 1:
            print("Odd.")
    else:
        print("n must be of type integer.")

Even.
Odd.
Even.
Odd.
Even.
Odd.
Even.
Odd.
Even.
Odd.
n must be of type integer.


# Functions

Finally, lets take a look at functions as a programming concept. In simple terms, a Python function is a named piece of code designed to carry out a specific task. Generally speaking, programmers are encoureged to keep functions simple and dedicated, i.e. a function should perform one task and one task only. Functions are an amazing way to improve readability as they encourage code reusability.

In [28]:
def isEven(n):
    """
    This is a docstring.

    Determine whether a given number is even.

    Parameters:
    n (int): The number to check.

    Returns:
    bool or None: True if the number is even, False if it is odd,
                  and None if the input is not an integer.
    """
    isEven = None

    if type(n) is int:
        if n % 2 == 0:
            isEven = True
        else:
            isEven = False
    else:
        isEven = None

    return isEven


In [29]:
for n in nums:
    print(isEven(n))

True
False
True
False
True
False
True
False
True
False
None


# Scientific computing with NumPy

NumPy (https://numpy.org/) is a core library for scientific computing in Python. It is essential for other commonly used libraries such as Pandas, Scikit-Learn, and SciPy among many others. NumPy owes its popularity to its efficient implementation of n-dimensional arrays that allows for seamless linear algebra operations.

In [30]:
import numpy as np

In [31]:
array1d = np.array([0, 1, 2])
array2d = np.array([[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]])
array3d = np.array([[[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]],
                    [[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]],
                    [[0, 1, 2],
                    [3, 4, 5],
                    [6, 7, 8]]])

In [32]:
print(array1d)
print("----------")
print(array2d)
print("----------")
print(array3d)

[0 1 2]
----------
[[0 1 2]
 [3 4 5]
 [6 7 8]]
----------
[[[0 1 2]
  [3 4 5]
  [6 7 8]]

 [[0 1 2]
  [3 4 5]
  [6 7 8]]

 [[0 1 2]
  [3 4 5]
  [6 7 8]]]


# Indexing and slicing a NumPy array

Indexing and slicing NumPy arrays is identical to indexing and slicing Python lists.

In [33]:
print(array1d[0])
print("----------")
print(array2d[1][2])
print("----------")
print(array3d[2][1][0])

0
----------
5
----------
3


In [34]:
print(array1d[0:2])
print("----------")
print(array2d[0:2])
print("----------")
print(array3d[1][0:2])

[0 1]
----------
[[0 1 2]
 [3 4 5]]
----------
[[0 1 2]
 [3 4 5]]


# NumPy array properties

Now we will take a closer look at some useful NumPy array properties. We will focus on ```array2d```, as it is complex and interesting, yet intuituive to work with.



In [35]:
print("The shape of array3d is: ", array2d.shape)
print("The dimensionality of array3d is: ", array2d.ndim)
print("The size of array3d is: ", array2d.size)
print("The type of data stored in array3d is: ", array2d.dtype)

The shape of array3d is:  (3, 3)
The dimensionality of array3d is:  2
The size of array3d is:  9
The type of data stored in array3d is:  int64


We can manipulate the shape of the array through the ```.shape()``` function.

In [36]:
array2d_reshaped = np.reshape(array2d, (1, 9))

In [37]:
print("The shape of array2d_reshaped is: ", array2d_reshaped.shape)
print("The dimensionality of array2d_reshaped is: ", array2d_reshaped.ndim)
print("The size of array2d_reshaped is: ", array2d_reshaped.size)
print("The type of data stored in array2d_reshaped is: ", array2d_reshaped.dtype)

The shape of array2d_reshaped is:  (1, 9)
The dimensionality of array2d_reshaped is:  2
The size of array2d_reshaped is:  9
The type of data stored in array2d_reshaped is:  int64


It is perhaps a bit counter-intuitive that the reshaped array is still 2D. Lets see why this is the case.

In [38]:
print(array2d_reshaped)

[[0 1 2 3 4 5 6 7 8]]


What if we want it to be 1D?

In [39]:
array2d_flattened = array2d.flatten()

In [40]:
print("The shape of array2d_flattened is: ", array2d_flattened.shape)
print("The dimensionality of array2d_flattened is: ", array2d_flattened.ndim)
print("The size of array2d_flattened is: ", array2d_flattened.size)
print("The type of data stored in array2d_flattened is: ", array2d_flattened.dtype)

The shape of array2d_flattened is:  (9,)
The dimensionality of array2d_flattened is:  1
The size of array2d_flattened is:  9
The type of data stored in array2d_flattened is:  int64


How are ```array2d_reshaped``` and ```array2d_flattened``` different?



In [41]:
print("The first element of array2d_reshaped: ", array2d_reshaped[0])
print("The first element of array2d_flattened: ", array2d_flattened[0])

The first element of array2d_reshaped:  [0 1 2 3 4 5 6 7 8]
The first element of array2d_flattened:  0


# Element-wise operations in NumPy

NumPy allows for the same algebraic operations as Python does. We shall take a look at those by the means of two 2D arrays.

In [42]:
m1 = np.array([[0, 1],
               [2, 3]])

m2 = np.array([[4, 5],
               [6, 7]])

In [43]:
m3_add = m1 + m2
m3_sub = m2 - m1
m3_mul = m1 * m2
m3_div = m3_mul / m2
m3_pow = m1 ** 2

In [44]:
print(m3_add)

[[ 4  6]
 [ 8 10]]


In [45]:
print(m3_sub)

[[4 4]
 [4 4]]


In [46]:
print(m3_mul)

[[ 0  5]
 [12 21]]


In [47]:
print(m3_div)

[[0. 1.]
 [2. 3.]]


In [48]:
print(m3_pow)

[[0 1]
 [4 9]]


NumPy also offers a range of useful aggregation functions.

In [49]:
print("The sum of m1: ", np.sum(m1))
print("The mean of m1: ", np.mean(m1))
print("The std of m1: ", np.std(m1))
print("The min of m1: ", np.min(m1))
print("The max of m1: ", np.max(m1))

The sum of m1:  6
The mean of m1:  1.5
The std of m1:  1.118033988749895
The min of m1:  0
The max of m1:  3


# Working with matrices

Last but not least, NumPy allows for efficient matrix operations, such as matrix transposition.

In [50]:
m1_transpose = m1.T

In [51]:
print(m1_transpose)

[[0 2]
 [1 3]]


Matrix multiplication.

In [52]:
mat_mul = m1 @ m2
mat_dot = np.dot(m1, m2)

In [53]:
print(mat_mul)
print("----------")
print(mat_dot)

[[ 6  7]
 [26 31]]
----------
[[ 6  7]
 [26 31]]


Matrix inversion.

In [54]:
m1_inv = np.linalg.inv(m1)

In [55]:
print(m1_inv)

[[-1.5  0.5]
 [ 1.   0. ]]


Determinant calculation.

In [56]:
m2_det = np.linalg.det(m2)

In [57]:
print(m2_det)

-2.000000000000003


Ocasionally, we know the size and dimensionality of the matrix we wish to arrive at, but we are yet to figure out its values. In those cases, computer scientists initialize a dummy matrix that contains the same value everywhere. NumPy offers functionality to do that for us.

In [58]:
array_zeros = np.zeros((3,3))
array_ones  = np.ones((3,3))

In [59]:
print(array_zeros)
print("----------")
print(array_ones)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
----------
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


Such occasion might be if we wanted to implement one of the element-wise operations from scratch. We know that the final matrix should have 4 elements and be 2 $\times$ 2 but we do not know the values yet.

In [60]:
m3 = np.zeros((2,2))

for i in range(m1.shape[0]):
    for j in range(m1.shape[1]):
        m3[i][j] = m1[i][j] * m2[i][j]

print(m3)

[[ 0.  5.]
 [12. 21.]]


# Why should I care about NumPy arrays when we already have Python lists?

Because NumPy arrays are faster. Lets look at a simple experiment. We will start by importing the ```time``` library. Note that all the details about ```time``` are not important for this course.



In [61]:
import time

We will then define a Python list and a NumPy array of the same shape (1D) and size.

In [62]:
size     = 10**5
py_list  = list(range(size))
np_array = np.arange(size)

We will then record the current time at executing the code cell, iterate over the Python list and multiply each of its elements by 2, and finally, get the total runtime by subtracting the starting time from the time of finishing the operation.

In [63]:
start_time     = time.time()
py_list_result = [x * 2 for x in py_list] # This is a list comperhenssion.
py_list_time   = time.time() - start_time

Do the same for the NumPy array.

In [64]:
start_time      = time.time()
np_array_result = np_array * 2
np_time         = time.time() - start_time

Simply compare the results.

In [65]:
print(f"Python list time: {py_list_time:.6f} seconds")
print(f"NumPy array time: {np_time:.6f} seconds")
print(f"Speedup: {py_list_time / np_time:.2f}x faster with NumPy")

Python list time: 0.011798 seconds
NumPy array time: 0.001438 seconds
Speedup: 8.20x faster with NumPy
