# Week-06: Functions and Arrays

## 1. Import Libraries 

In [None]:
# the "numPy" library is used for mathematical operations
# the "matplotlib" library is for generating graphs
# the "pandas" library is for manipualting datasets

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## 2. Introduction to Functions 

<font size="4"> 

A function is ...

- a **block of reusable** code to perform a a specific task
- Functions avoid repetition
- As our code grows larger, functions make it more manageable



<font size = "4">

Enter arguments by assigning parameters

In [None]:
# Here "df" and "size" are both parameters
# They get assigned the arguments "2" and "20", respectively
# The return is a vector of random variables
np.random.seed(100)
vec_x = np.random.uniform(low=-2, high=2, size=10)

In [None]:
vec_x

In [None]:
# Write your own code:
# Define vec_y by writing arguments by position:

# What if you write vec_y = np.random.uniform(10, -2, 2)
vec_y = np.random.uniform(10, -2, 2)
print(vec_y)

In [None]:
# Write your own code
# Define vec_z by writing arguments by name (with whatever order you like)
vec_z = np.random.uniform(high=2, low=-2, size=10)
print(vec_z)

## 3. Customize Functions 

<font size = "4">

You can write your own functions:

```python

    #---- DEFINE
    def my_function(parameter):
        body
        return expression

    #---- RUN
    my_function(parameter = argument) 

    #---- RUN
    my_function(argument)
```
<br>

<font size = "4">

Example 1: Calculate the PMF of a Poisson distribution named `fun_poisson_pmf`.

Parameters include `lambda_val` and `k` and return `pmf_val`.

$P(X=k)=\lambda^k \exp(-\lambda) / k!, k\in\mathbb{N}_0$

In [None]:
# Write your own code:
def compute_fact(input_val):
    fact_val = np.prod(np.arange(1, input_val+1))
    return fact_val
    
def fun_poisson_pmf(lambda_val, k):
    # k_fact = np.cumprod(np.arange(1,k+1))[-1]
    # k_fact = np.prod(np.arange(1,k+1))
    # k_fact = compute_fact(k)
    k_fact = compute_fact(input_val=k)
    pmf_val = lambda_val**k * np.exp(-lambda_val) / k_fact
    return pmf_val

In [None]:
# Write your own code:
# Call your function
pmf_1 = fun_poisson_pmf(lambda_val=1.5, k=1)
print(pmf_1)

pmf_1 = fun_poisson_pmf(1.5, 1)
print(pmf_1)

lambda_val_2 = 1.5
k_2 = 1
pmf_1 = fun_poisson_pmf(lambda_val_2, k_2)
print(pmf_1)

pmf_1 = fun_poisson_pmf(lambda_val=lambda_val_2, k=k_2)
print(pmf_1)

In [None]:
# You can know compute the formula with different values
# Let's see how much one can gain by investing 50k and 100k
# Earning 10% a year for 10 years


<font size = "4">

Example 2:

- Write a function that calculates <br>
 $f(x) = x^2 + 2x + 1$.

 - Test your function with $x = 2$ and $x = 3$


In [None]:
# Write your own code here
def fn_quadratic(input_x):
    y = input_x**2 + 2*input_x + 1
    return y

In [None]:
# call the function
y_1 = fn_quadratic(input_x=2)
y_2 = fn_quadratic(input_x=3)
print(y_1, y_2)

y_1 = fn_quadratic(2)
print(y_1)
x_val = 2
y_1 = fn_quadratic(input_x=x_val)
print(y_1)
y_1 = fn_quadratic(x_val)
print(y_1)

<font size='4'>
Example 3:<br>
Call back the nested for loop from the previous class. Below is the prompt and original code.

- The CLT is a fundamental concept in statistics.
- It states that the distribution of the mean (or sum) of many independent, identically distributed random variables approaches to a normal distribution, regardless of the original distribution
- This is true even if the original distribution is NOT normal.
- Let $\bar{X}$ be the sample mean of a random vector $(X_1,\cdots,X_n)$.
- What happens to $\bar{X}$ with different $n$?
    - The Central Limit Theorem makes a prediction!
    - It says that the distribution will have a bell shape with higher $n$.
- Let's verify CLT by simulating random vectors from a uniform distribution from $-4$ to $5$.
- The previous code looks like:


In [None]:
iteration_num = 1000
sample_size_ls = [1,10,50,100]
unif_vec_ls_ls = []

for sample_size in sample_size_ls:

    # initialize unif_vec_sample_size_ls first
    unif_vec_sample_size_ls = []
    
    for iter_num in range(iteration_num):
            unif_vec_sample_size_iter  = np.random.uniform(
                low = -4, high=5, size = sample_size
            )
            unif_vec_sample_size_ls.append(
                float(np.mean(unif_vec_sample_size_iter))
            )

    unif_vec_ls_ls.append(unif_vec_sample_size_ls)

<font size='4'>

- We can write two functions to simplify the process. 
- It will be easier for you to understand once we wrap up the code with functions. 
- Try developing this habit to decompose your tasks to multiple functions.
- We will start the smallest task unit - **computing sample size mean of a uniform vector**.
    - Let the function name be `compute_unif_mean` with parameters `low_val`, `high_val`, and `sample_size_n`.
    - Generate a uniform vector using `np.random.uniform()`, give a variable name `unif_vec` and return its sample mean using `np.mean()`.
    - Make sure the format inside function is parameter_name = argument_name (by name) or argument_name to follow the order (by position).

In [None]:
# Write your own code
def compute_unif_mean(low_val, high_val, sample_size_n):
    unif_vec = np.random.uniform(low_val, high_val, sample_size_n)
    #  -> by position, the order matters.
    # unif_vec = np.random.unifrom(low=low_val, high=high_val, 
    # size=sample_size_n) -> by name, the order does not matter
    return float(np.mean(unif_vec))

<font size='4'>
    
- Then, we work on the larger task unit - **looping through `iteration_num` to build a list of sample mean values**.
    - Let the function name be `loop_iteration_num_build_ls` with parameters `low_val`, `high_val`, `sample_size_n`, and `iteration_n`.
        - Remember to include the parameters for the previous function if you want to call it inside it.
    - Initialize an empty list called `unif_vec_sample_size_mean_ls`.
    - Write a for loop to iterate `range(iteration_n)`, compute the sample mean of each random vector per iteration, and append it to the list initialized just now.
    - Return the list `unif_vec_sample_size_mean_ls`.

In [None]:
# Write your own code
def loop_iteration_num_build_ls(
    low_val, high_val, sample_size_n, iteration_n
):
    unif_vec_sample_size_mean_ls = []
    for iter_n in range(iteration_n):
        sample_mean_iter_n = compute_unif_mean(
            low_val, high_val, sample_size_n
        )
        unif_vec_sample_size_mean_ls.append(sample_mean_iter_n)
    return unif_vec_sample_size_mean_ls

<font size='4'>

- The original process is decomposed into a few smaller tasks.
- Initialize the parameter with proper values, i.e., `low_value`, `high_value`, and `iteration_num`.
    - Note that it is okay to intialize variable names differently from the parameter names of the function. But you have to observe the rule either by position or by name.
- Our code can be further simplified to the following format.

In [None]:
# Write your own code
iteration_num = 1000
sample_size_ls = [1,10,50,100]
unif_vec_ls_ls = []

low_value = -4
high_value = 5

for sample_size_n in sample_size_ls:
    unif_vec_iter_ls = loop_iteration_num_build_ls(
        low_value, high_value, sample_size_n, iteration_num
    )
    unif_vec_ls_ls.append(unif_vec_iter_ls)

print(len(unif_vec_ls_ls))
print(unif_vec_ls_ls[0][:10])

## 4. Lambda Functions 

<font size = "4">

"Lambda Functions" are defined in one line:

```python
my_function = lambda parameters: expression
```

<font size = "4">

Calculate $x + y$

In [None]:
def fn_sum(x, y):
    return x + y

In [None]:
fn_sum(1,2)

In [None]:
# (a) Define function
fn_sum = lambda x,y: x + y

# (b) Run function
fn_sum(1,2)

<font size = "4">
Example 4: <br>
Rewrite the PMF of Poisson distribution using a Lambda function.

In [None]:
# Write your own code:
lambda_fn_poisson_pmf = lambda lambda_val,k: lambda_val**k * np.exp(-lambda_val) / np.prod(np.arange(1, k+1))

In [None]:
print(lambda_fn_poisson_pmf(1.5, 1))

<font size = "4">
Example 5:
    
Boolean + Functions

- Write a function called `fn_iseligible_vote`
- This functions returns a boolean value that checks whether $age \ge$ 18
- Test your function with $age = 20$

In [None]:
# Write your own code
fn_iseligible_vote = lambda age: age>=18

In [None]:
fn_iseligible_vote(20)

## 5. Functions for Visualisation 

<font size = "4">
Returning a value is not always necesary, you can write:

```python

    #---- DEFINE
    def my_function(parameter):
        body
```

<font size = "4">

Example 6: A customized plot

- You can use functions to store your favorite aesthetic
- The function name: `red_histogram`.
- Parameters include `vec_x` and `title`.

In [None]:
# Define the function
def red_histogram(vec_x,title):
    plt.hist(x = vec_x, color = "red")
    plt.title(title)
    plt.ylabel("Frequency")
    plt.show()

carfeatures = pd.read_csv("data/features.csv")

red_histogram(vec_x = carfeatures["weight"], title = "Histogram")
red_histogram(vec_x = carfeatures["acceleration"], title = "Histogram")


<font size = "4">

Example 7:

Create a function that computes a red scatter plot named `red_scatterplot` <br>
 that takes `vec_y` and `vec_x` inputs.

When you call the function, you can plot **acceleration** and **weight**.

In [None]:
# Write your own code
# Define the function
def color_scatterplot(vec_x, vec_y, name_x, name_y, title, color='red'):
    plt.scatter(vec_x, vec_y, color=color)
    plt.xlabel(name_x)
    plt.ylabel(name_y)
    plt.title(title)
    plt.show()

In [None]:
# call the function
acce = carfeatures["acceleration"]
weight = carfeatures["weight"]
# color_scatterplot(acce, weight, 'acceleation', 'weight', 'scatterplot')
color_scatterplot(acce, weight, 'acceleation', 'weight', 'scatterplot', 
                  'skyblue')

## 6. Arrays


<font size='4'>
    

<font size='4'>

- Array is a grid that contains values of the same data type.
- Previously, we covered 1-dim, but it can go beyond that.

In [None]:
# 1-dim array
arr_1d = np.array([1,2,3], dtype=np.int64)
# dtype is optional: depending on your operation system, 
# it is np.int32 or np.int64 by default.
print(arr_1d)

# 2-dim array
arr_2d = np.array([[1,2,3], [4,5,6]], dtype=np.int64)
print(arr_2d)

# 3-dim array
arr_3d = np.array([[[1,2,3],
                    [4,5,6]],
                   [[-1,-2,-3],
                    [-4,-5,-6]]], dtype=np.int64)
print(arr_3d)
print()

<font size='4'>

- The array holds and represents any regular data in a *structured* way.
- An array contains information about *raw data (memory address)*, how to *locate an element (shape and indexing)*, and how to *interpret an elemnt (data type)*.
    - For this class, you should at least know the shape of two dimension array (matrix).
    - The shape of a 2d-array is a coordiante of two integers (known as `tuple`, a regular data type).
    - Axis 0 corresponds to # of rows, while axis 1 corresponds to # of columns.

In [None]:
# print out memory address
print(arr_2d.data)

# print out shape
print(arr_2d.shape)

# print out data type
print(arr_2d.dtype)

<font size='4'>

Useful functions to initialize an empty numpy array.

- `np.ones()`
- `np.zeros()`
- `np.random.random()`
- `np.empty()`
- `np.full()`
- `np.arange()`
- `np.linspace()`

In [None]:
# Create an array of ones
ones_array = np.ones((3, 4))
print("Ones Array:")
print(ones_array)
print()

# Create an array of zeros
zeros_array = np.zeros((2, 3, 4), dtype=np.int16)
print("Zeros Array:")
print(zeros_array)
print()

# Create an array with random values
# Return random floats in the half-open interval [0.0, 1.0). 
random_array = np.random.random((2, 2))
print("Random Array:")
print(random_array)
print()

# Create an empty array
empty_array = np.empty((3, 2))
print("Empty Array:")
print(empty_array)
print()

# Create a full array
full_array = np.full((2, 2), 7)
print("Full Array:")
print(full_array)
print()

# Create an array of evenly-spaced values
arange_array = np.arange(10, 25, 5)
print("Arange Array:")
print(arange_array)
print()

# Create an array of evenly-spaced values
# the stop value is set to achieve by default (endpoint=True), you can modify it by adding
# an optional parameter endpoint=False
# https://numpy.org/doc/stable/reference/generated/numpy.linspace.html 
print(np.linspace(0,2,9))
print(np.linspace(0,2,9,endpoint=False))
print()

<font size='4'>
    
Useful functions to load np arrays from text (.txt files)
- `np.loadtxt()`
- `np.genfromtxt()`.

In [None]:
# This is your data in the text file
# Value1  Value2  Value3
# 0.2536  0.1008  0.3857
# 0.4839  0.4536  0.3561
# 0.1292  0.6875  0.5929
# 0.1781  0.3049  0.8928
# 0.6253  0.3486  0.8791

# Import your data
x, y, z = np.loadtxt('./data/working_example_data.txt', skiprows=1, unpack=True)
# unpack option allows you to import dataset and return the columns as separate arrays.
# https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html
print(x)
print(y)
print(z)

In [None]:
# Your data in the text file
# Value1  Value2  Value3
# 0.4839  0.4536  0.3561
# 0.1292  0.6875  MISSING
# 0.1781  0.3049  0.8928
# MISSING 0.5801  0.2038
# 0.5993  0.4357  0.7410

# np.genfromtxt() can checking the missing values
# https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt
my_arr2 = np.genfromtxt('./data/working_example_data_2.txt', skip_header=1, filling_values=-9)
print(my_arr2)

<font size='4'>

How to save your numpy arrays?

- `np.savetxt()`: save an array to a text file
- `np.save()`: save an array to a binary file in NumPy .npy format
- `np.savez()`: save several arrays into an uncompressed .npz archive
- `np.savez_compressed()`: save several arrays into a compressed .npz archive
- `np.load()`: load arrays from `.npy` and `.npz` formats.
    - In presence of multiple arrays, we need advanced syntax (either `.keys()` or `with`) to access specific elements.
    - Will cover it later.

In [None]:
# Let's take my_arr2 as an example.

np.savetxt('./data/my_arr2_delimiter_space.txt', my_arr2, delimiter=' ')
np.savetxt('./data/my_arr2_delimiter_comma.txt', my_arr2, delimiter=',')

np.save('./data/my_arr2.npy', my_arr2)
# https://numpy.org/doc/stable/reference/generated/numpy.save.html

np.savez('./data/my_arr2.npz', my_arr2)
# https://numpy.org/doc/stable/reference/generated/numpy.savez.html#numpy.savez

np.savez_compressed('./data/my_arr2_comp.npz', my_arr2)
# https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html#numpy.savez_compressed

# save multiple arrays
np.savez('./data/my_arr2_twice.npz', my_arr2, my_arr2)
np.savez_compressed('./data/my_arr2_twice_comp.npz', my_arr2, my_arr2)

my_arr2_twice = np.load('./data/my_arr2_twice.npz')
print(my_arr2_twice)
print(my_arr2_twice['arr_0'])
print(my_arr2_twice['arr_1'])
print(' ')

In [None]:
# Or we can use with function
with np.load('./data/my_arr2_twice.npz') as my_arr2_twice:
    arr_0 = my_arr2_twice['arr_0']
    arr_1 = my_arr2_twice['arr_1']
print(arr_0)
print(arr_1)