[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/guvendemirel/QMULSBM_PhDWorkshop/blob/master/SBM_PhD_python_workshop_part2.ipynb)

# QMUL SBM PHD Python Workshop - Part 2

In this session, we will work on the building blocks of programming (conditional statements, loops, functions) and the Numpy package.

## Control Flow

Control flow refers to the specification of the order of execution of different blocks of code using conditional statements, loops, function calls, etc. Boolean data type and operations are critical for control flows.

The scalar Boolean data type in Python is `bool`, which has two instances `True` and `False`. 

In [None]:
type(True)

### Boolean Operations
`a` and `b` boolean:
- `a & b`: AND - True if both a and b are True
- `a | b`: OR - True if either a and/or b is True
- `a ^ b`: XOR - True if any one of a or b is True, but not both

`a` and `b` generally:
- `a == b` True if the value of a equals b
- `a != b` True if the value of a is not equal to b
- `a <= b, a < b`	True if a is less than or equal to (less than) b (numerical)
- `a > b, a >= b`	True if a is greater than or equal to (greater than) b
- `a is b`	True if a and b reference the same Python object
- `a is not b`	True if a and b reference different Python objects

In [None]:
# What outputs do you expect?
print(True & False)
print(True ^ True)

In [None]:
list1 = [1, 2, 0]
list2 = [1, 2, 0]
# What output do you expect?
print(list1 == list2, list1 is list2)

In [None]:
list3 = list1
# What output do you expect?
print(list2 == list3, list1 is list3, list2 is list3)

## Choice Statements:
Choice statements control the flow in the program depending on which condition is met.
```pyton
if condition1:
    statement1
elif condition2:
    statement2
elif condition3:
    statement3
else:
    statement4
```

As an example, we will generate two standard normal variables and choose the greater value. 

In [None]:
# import numpy
__

np.random.seed(42) #set the seed for pseudo-random numbers 

# a standard normal random number 
x = np.random.standard_normal() 

# generate another standard normal random number
y = __

# format replaces the {} in the string with the corresponding argument
print("x={0:.2f} and y={1:.2f}".format(x, y)) 

In [None]:
# Print the maximum of x and y
__: 
    print("Maximum is {0:.2f}".format(__))
__:
    print("Maximum is {0:.2f}".format(__))

## Loops 

`for` loops are used for iterating over collections, such as lists, tuples, and any iterators. 

`range` function is used to create iterators. It is called by `range(1,5)`, which iterates as 1, 2, 3, and 4. The advantage of iterators as opposed to lists or tuples is that you do not keep the full list. We need only the value in a specific iteration, which saves a lot of memory if looping over a large range.

In Python, `for` loops should only be used in complex situations because they are slower than list comprehensions and Numpy array methods (both to follow). 

If you want to check time performance of an operation you can use the `%%time` and `%%timeit` magics.  

Example: create a loop that calculates the square of one million randomly generated standard normal variables.

In [None]:
# Genereate 1000000 standard normal variables
vals = np.__.__(__)

In [None]:
%%timeit
# Create an empty list
vals_squared = __
# Loop over the vals array and append to the list the square of the current element
for val in vals: 
    __

You can do the same with **list comprehension** in shorter time:

In [None]:
%%timeit
vals_squared = [val ** 2 for val in vals]

It is even faster to use Numpy methods and operations, which are directly implemented in C.

In [None]:
%%timeit
vals_squared = vals ** 2 # note that this works with Numpy arrays, not lists

In most cases, we need to combine `for` loops with `if else` statements. You can also do this in list comprehensions:

In [None]:
# For the list below
my_list = [1, 10, 'analytics', 'business analytics', 'data', 'machine learning', 'statistics', 'python']

# We want to convert all strings to uppercase (common task in data cleaning) 
# and skip other types 
# Hint: isintance(x,type) checks if x is an instance of class type 
__

## Functions
Functions are crucial building blocks of programming, which eliminate redundancy and provide abstraction. If you repeat a certain code that solves a non-trivial task in different parts of your code, it is a good indication that you should introduce a function. Organizing your blocks of code into functions provides an abstraction, which makes it easier to understand your code. Furthermore, the maintenance of code is easier with functions. We have so far used several built-in functions, e.g. `print` function, or functions from other packages such as Numpy, e.g. `np.sqrt()`. The functions are called by providing their arguments, lists of which can be checked from the help.

We can also define our own functions. Let's now define a simple function. **Positional arguments** must always be provided inputs. Default values are used for **keyword arguments** if no value is provided. The function definition syntax is as follows:

```python
def function_name(arg1, arg2 = val2):
    code
    return value
```

### Namespaces
The local namespace (variable scope) is created when the function is called and the arguments are automatically loaded. You can also use variables from the global namespace in the local namespace. However, you cannot change their values. If you try, a new variable is created. You must use the `global` keyword if you want to update its value.

As an example, let's write a function that cleans the names of subject areas.

In [None]:
import re #regular expressions library for handling text

courses = [' accounting ', 'finance ', 'Marketing ', 'supply chain management#', '?interna92tional BuSiness1\n']

# Create a list of characters to remove
remove_chars = '[_!#?*0-9]'

def clean_name(name):
    cl_name = __.strip() #remove the beginning and ending white spaces and new lines
    cl_name = re.sub(remove_chars, '', __) #remove unwanted characters
    cl_name = __.title() #first letter capital 
    __

# Write a list comprehension that applies clean_name to each name in the courses list
cleaned_courses = __
cleaned_courses

Final check variable scope:

In [None]:
# What happens if you try accessing cl_name outside the function (in the global scope), 
# where it is not in the namespace?
cl_name

Since in Python everything is an object, the variables are passed by object (reference), in contrast to passing by value, which is more common in other programming languages. The behaviour depends on whether the argument is of mutable vs immutable data type, similar to the behaviour of variable assignment as we have seen before. The following exercises illustrate variable scopes and the behaviour for mutable and immutable arguments.

In [None]:
# Try the following code - Version 1
def clean_name(name):
    name = name.strip() 
    name = re.sub(remove_chars, '', name) 
    name = name.title()
    return name

name1 = "?businesS analytics#"
name2 = clean_name(__)

# What output do you expect and why?
print(name1, name2)

In [None]:
# Try the following code - Version 2
def clean_name_alt(name):
    name1 = name.strip() 
    name1 = re.sub(remove_chars, '', name1) 
    name1 = name1.title() #first letter capital 
    return name1

name1 = "something else"
name2 = "?businesS analytics#"
name3 = clean_name_alt(name2)

# What output do you expect and why?
print(name1, name2, name3)

In [None]:
# Try the following code - Version 3
def clean_name_alt(name):
    # set the scope of the variable to global
    __ 
    name1 = name.strip() 
    name1 = re.sub(remove_chars, '', name1) 
    name1 = name1.title() #first letter capital 
    return __

name1 = "something else"
name2 = "?businesS analytics#"
name3 = clean_name_alt(name2)

# What output do you expect and why?
print(name1, name2, name3)

In [None]:
# Check whether name1 and name3 are the same object
__

In [None]:
# Working with list arguments

# Global scope:
courses = [' accounting ', 'finance ', 'Marketing ', 'supply chain management#', '?interna92tional BuSiness1\n']
no_courses = 0

def clean_names(name_list):
    # set the scope to global
    __ no_courses
    # loop over the list by both index and value
    for __, __ in __:
        # increment no_courses
        __
        # call clean_name function and update the list
        __        

# clean the courses list
__

# What output do you expect and why?
print(courses, no_courses)

## Numpy Package
NumPy is the main package for numerical computations in Python. It is used together with the Pandas package for data analysis. Numpy is highly efficient even for large amounts of data because its methods are implemented in C. NumPy provides methods, operations, and functions that can be applied to whole arrays without the need for loops or list comprehensions.

### Numpy Arrays
The core data structure of Numpy is `ndarray` (N-dimensional array). Arrays are homogeneous, meaning that all units have the same data type (mostly numeric and logical), differently from lists that are heterogeneous. With arrays, you can apply operations as if they were scalars. 

In [None]:
# Import the numpy package
__

# Set the seed for random numbers
__

# Create my_array 2 x 3 array of standard normal variables
my_array = __.__.__((2, 3)) 
my_array

All mathematical operators are applied element-wise:

In [None]:
# Multiply all entries of my_array by 10
10 * my_array

In [None]:
# Subtract 0.2*my_array from my_array
__

In [None]:
# Divide 10 by each entry of my_array
__

In [None]:
# Raise 0.2 to the power given by each entry of my_array
__

In [None]:
# Take the fourth power of each entry of my_array
__

You can access the shape of the array by the `shape` attribute:

In [None]:
# How many rows and columns does my array have my_array?
__

You can create arrays from other collections by using the `np.array()` function.

In [None]:
my_list = [3, 6, -2]
# Create my_array from my_list
my_array = __
# Change the values of both my_list and my_array
my_list *= 4  
my_array *= 2

# What results do you expect?
print(my_list, my_array)

In [None]:
my_list = [3, 6, -2]
# create a list that contains 2 * each value of my_list elements
__

You can create two-dimensional arrays from lists with equal length:

In [None]:
data1 = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
# Create 2D array from data1
arr1 = __
# print arr1
__

A commonly used Numpy array method is `reshape((n1,n2))` which reshapes the array to a 2D array with n1 rows and n2 columns (can be applied to higher dimensional arrays). Another useful method is `ravel()`, which flattens the array

In [None]:
# range(1, 13) -> 1, 2, 3, ..., 12
# Numpy's arange is similar to the range iterator but returns an array 
arr2 = __
# Print the array
arr2

In [None]:
# Reshape the array to 3x4 two-dimensional array
arr2.__

In [None]:
# Reshape the array to 2 rows and assign to arr3
arr3 = __

# Flatten arr3
__

You can compare two numerical arrays to form a Boolean array.

Create two standard normally distributed 2d (2x4) arrays and check whether the entry in the first array is greater than or equal to the second array entry.

In [None]:
arr1 = __
arr2 = __
# create the boolean array that contains whether arr1 value is greater than or equal to arr2 value
__

## Indexing and Slicing Numpy Arrays
Indexing and slicing are done in a very similar way to lists and tuples. If the array is high dimensional (at least 2), you provide indices for all dimensions.

In [None]:
arr1 = np.array([3, 6, -2, 7, 9])
# The element in index 1
print(__)
# Slice that starts with the index 2 up until the end
print(__)

With Numpy arrays, you can pass multiple indices as a list:

In [None]:
# Subscript elements at index 0, 2, and 3
arr1[__]

Working with 2D arrays:

In [None]:
arr2 = np.arange(12).reshape(4,3)
arr2

In [None]:
# Subscript to the element in row 1 and column 2
__

In [None]:
# Dice to the sub-matrix from row 2 to the last row and columns 0 and 2 (not 1)
__

You can then assign values to these slices. If you assign a scalar, its value is repeated for each entry.

In [None]:
# For the dice above, assign the values to 100
__

# Print arr2


You can also index by using logical conditions, which you can equally apply to other collections such as lists and tuples. 

In [None]:
# Data type conversion needed for the next replacement (remember homogeneous types)
arr2 = arr2.astype(np.float64)

# Identify the elements of arr2 which are greater than or equal to 7
__

# Set the values >= 7 to normal random numbers with mean 10, std dev =2
arr2[__] = __(10, 2, np.sum(arr2 >= 7))
arr2

## Numpy functions
Numpy provides a wide range of functions and methods that can be applied efficiently on arrays.  

**Unary functions**:
They are all element-wise transformations.
- `abs`: Compute the absolute value
- `sqrt`: Compute the square root
- `exp`: Compute the exponential
- `log`,`log10`: Natural logarithm, log base 10
- `sign`: Compute the sign
- `rint`: Round to the nearest integer
- `isnan`: Return boolean array indicating whether each value is NaN (Not a Number)
- `cos`, `cosh`, `sin`, `sinh`, `tan`, `tanh`: Regular and hyperbolic trigonometric functions

**Binary functions:**
They take two arrays and return a single array as the result. 
- `maximum`: Element-wise maximum
- `minimum`: Element-wise minimum
- `mod`: Element-wise modulus

In [None]:
# Create an array of 0, 1, ...,9
arr1 = __
# Exponentiate that array elementwise
print(__)

Create two standard normal 2d (2x4) arrays and choose the maximum of the two arrays for each element

In [None]:
arr1 = __
arr2 = __
# Create arr3, which is elemtwise maximum of the two
arr3 = __
# Print arr1, arr2, arr3

In [None]:
arr = np.arange(12).reshape(4,3)
arr

In [None]:
# Overall mean, sum, standard deviation
arr.__, arr.__, arr.__

In [None]:
# Mean across the rows
arr.__

In [None]:
# Maximum across the columns
arr.__  

## Exercise

Now complete the following exercises to practice the concepts you learned in this session. 

This is similar to the exercise in Session 1, but by using functions and loops. We have two lists of grades and student names, matching each other in order. We ask you to clean the names and return a dictionary of mark class:
- D: Distinction, \[70, 100\]
- M: Merit, \[60, 70\)
- P: Pass, \[50, 60\) 
- F: Fail, \[0, 50\)

In [None]:
# Part 1
# import package for regular expressions
__

# create the list of student names
names = ['SamWise! Gamgee', ' Frodo Baggins111 ', '*Pippin Took*', ' Merry Brandybuck', 
         ' Galadriel30', 'Gandalf:', 'Gollum.', 'Elrond?', 'Gimli', ' Legolas ', '!Arwen#', ' Boromir? ', '#Aragorn# ']

# characters to remove from the names
remove_chars = '[.,:_!#?*0-9]'

# create a function that removes remove_chars from a given name and coverts to title case
__ __(__):
    __ = __.__() #remove the beginning and ending white spaces and new lines
    __ = __.__(__, __, __) #remove unwanted characters
    __ = __.__() #first letter capital 
    return __

# Clean the names of the students
cleaned_names = __

In [None]:
# Part 2
# create a list of student IDs such that 'Samwise Gamgee' has ID QMUL_Msc_BA_1, 'Frodo Baggins' has ID QMUL_Msc_BA_2, etc.
prefix = 'QMUL_MSc_BA'
student_ids = __

# list of grades
grades = [90, 76, 55, 57, 82, 85, 68, 65, 63, 74, 71, 43, 78]

# write a function that returns F, P, M, or D depensing on the score
__

# calculate student mark class and store it as a list
student_class = __

# Create a dictionary with the student ID as the key and tupe of student name, grade, and grade class as value,
# e.g. 'QMUL_MSc_BA1': ('Samwise Gamgee', 90, 'D'), 'QMUL_MSc_BA2': ('Frodo Baggins', 76, 'D' ...
# Hint: You can use zip at two levels inside zip the lists to create the tupples and then zip again with the keys
students = dict(zip(__, zip(__, __, __)))

# print the dictionary 
students

In [None]:
# Part 3 Numpy exercises

import numpy as np

# 1. Append a list to the end of an array
#Hint: Use the append function

arr1 = np.array([1, 2, 3, 4])

# Create a list of odd numbers from 5 to 25 (inclusive)
list1 = __(5, 26, __)

# append list1 to  arr1 
arr2 = __.append(__, __)

#print arr2
arr2

In [None]:
# 2.Test which elements of an array is contained in another array
# Hint use the isin function

list2 = [0, -1, 1]
list3 = [9, -7, 1, 0, 0.01]

# Check whether the elements of list2 are in list3
__

In [None]:
# 3. Flatten a given list

list4 = [[9, 7, -3], [1, 0, 9], [-2, np.nan, np.inf]]

# Flatten list4
arr4 = np.__(list4)
# Print arr4
arr4

In [None]:
# 4. Write a function to find the value of the smallest entry in an array for which the value is equal to greater than the mean 

__ greater_than_mean_fn(arr):
    # Create a copy of the arr array
    x = arr.__()
    # Sort x (note: arr remains untouched)
    x.sort()
    # Find the first value that is larger the mean of the array
    # Hint: the list is already sorted. Filter according to whether greater than or equal to the mean and choose the one
    # with zero index
    val = __
    return val

# Test with arr5
arr5 = np.array([13, 6, -2, 8, 3, 7, 4])

# What is the smallest value in arr5 that is greater than its mean?
__(arr5)

In [None]:
# 5. Swap two rows in a 2d Numpy array

# Create a 4-by-5 2d numpy array of random integers from 0 to 100
arr6 = np.random.__(__, __, __)
print('Before swap: \n', arr6)

# Swap the first and last rows
arr6[__, __] = arr6[__, __]
print('After swap: \n', arr6)

## What to expect next?
- In our next session, we will move onto Pandas to work with data. 
- In your own time, you can work on other online exercises or tutorials on the topics you learned in this session. There are many free resources on the web.