# Functions

## Calling functions

Previously, we have already made use of many built-in functions to facilitate programming, such as `range()`, and `len()`. 
A function is a block of code with input arguments (and, optionally, return values). In Python ( and many other languages), a function call is as the following:

```python
>> output = function(input_argument)
```

For example, the `range` is a function that creates a list of consecutive numbers

In [8]:
out = range(4)
list(out)

[0, 1, 2, 3]

sum() is another function that sums up the numbers in a list

In [9]:
a = [1,2,3]
sum(a)

6

The output of one function can be the input to another, for example:

In [10]:
sum(range(4))
range(len('Target'))

range(0, 6)

#### Define our own functions

Let's now try make our own functions. Before that, we need to be clear on the structure of a function
```python
def func_name(arg1, arg2, arg3):
    output = arg1 + arg2 + arg3
    return output
```

\* *`return output` is not required*

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables.

In [11]:
def mySum(list_to_sum):
    return sum(list_to_sum)

In [12]:
mySum(range(5))

10

A more complicated one that does not use `sum` function.

In [69]:
def mySumUsingLoop(list_to_sum):
    sum_ = 0
    for i in list_to_sum:
        sum_ = sum_ + i  #same as sum_ += i
    return sum_

Let's call that function

In [14]:
mySumUsingLoop(range(5))

10

*The two example functions are not doing anything interesting but just served as illustrations to build customized functions.*

---

# Libraries

Often times, we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_. 

## Built-in libraries

Python provides many built-in packages to prevent extra work on some common and useful functions

We will use __math__ as an example.

In [15]:
import math # use import to load a library

To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`

In [70]:
x = 3
print(math.exp(x))
print(math.log(x))

You can also import one specific function:

In [17]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

20.085536923187668


Or all:

In [18]:
from math import * # Import all functions

In [19]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

20.085536923187668
1.0986122886681098


Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package.

## External libraries

There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.

We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.


If you use Anaconda, these are ready for your use (they are preinstalled).

Loading external libraries is just the same as built-in ones. To use _alias_ for easier access to the libraries, we can import a library by: `import library_name as alias`. For example:

In [73]:
# After you install numpy, load it
import numpy as np # you can use np instead of numpy to call the functions in numpy package

In [21]:
x = np.array([[1,2,3], [4,5,7]], dtype=np.float) # create a numpy array object, specify the data type as float
print(x)
print(type(x))

[[1. 2. 3.]
 [4. 5. 7.]]
<class 'numpy.ndarray'>


---

# Quick Intro to Numpy

Instead of using the native data structures like the list, we use `numpy.ndarray` for data analytics most of the time. While they are not as "flexible" as lists, they are easy to use and have better performance. As Numpy's official documentation states:
> NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

As we were using it just now, the most common alias for `numpy` is `np`:

In [82]:
import numpy as np # The numpy module will be accessible as np.

or simply:

In [83]:
import numpy #the numpy module will be accessible as numpy.

Functions in a module are referred to using the `.` for example

In [24]:
np.array

<function numpy.array>

means the function `array` of the module `np` 

## Create arrays

Depending on what types of data we are going to work on later, arrays can be initialized "by hand" or through functions.

### By hand

This is very similar to creating a list of elements manually, except that we wrap the list around by `np.array()`. The following creates a vector (array of dimension 1)

In [85]:
arr = np.array([1,2,3,8])
arr

array([1, 2, 3, 8])

In [87]:
arr*2

array([ 2,  4,  6, 16])

In [26]:
arr.shape

(4,)

#### Multidimensional arrays: seperated by comma

Create a 3 by 4 Matrix: 3 rows and 4 columns

In [94]:
arr = np.array([[2,3,8,7], [2,3,2,8], [5,0,8,8]])
arr

array([[2, 3, 8, 7],
       [2, 3, 2, 8],
       [5, 0, 8, 8]])

In [99]:
np.shape(arr) 
arr.shape

(3, 4)

In [32]:
arr

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

### By functions

There are many special array initialization methods to call:

In [106]:
np.zeros([3,5]) #initializes with zeros

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [109]:
np.ones([3,5], dtype='int') #initializes with ones

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [119]:
np.eye(3)
#initializes a "diagonal" matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Arithmetic operations

The rules for arithmetic operations are very similar to R or Matlab: they are generally element wise

In [120]:
arr

array([[2, 3, 8, 7],
       [2, 3, 2, 8],
       [5, 0, 8, 8]])

In [121]:
arr * 6

array([[12, 18, 48, 42],
       [12, 18, 12, 48],
       [30,  0, 48, 48]])

In [122]:
arr - 5

array([[-3, -2,  3,  2],
       [-3, -2, -3,  3],
       [ 0, -5,  3,  3]])

In [123]:
np.exp(arr)

array([[7.38905610e+00, 2.00855369e+01, 2.98095799e+03, 1.09663316e+03],
       [7.38905610e+00, 2.00855369e+01, 7.38905610e+00, 2.98095799e+03],
       [1.48413159e+02, 1.00000000e+00, 2.98095799e+03, 2.98095799e+03]])

Note that if we want conduct matrix multiplication, we need to use `@` or `.dot` function, since `*` still means element wise computation. (We will see matrix multiplication later in this class)

In [40]:
arr_2 = np.array([[1], [3], [2], [0]])
arr_2

array([[1],
       [3],
       [2],
       [0]])

In [41]:
arr @ arr_2

array([[13],
       [15],
       [19]])

In [42]:
arr.dot(arr_2)

array([[13],
       [15],
       [19]])

We can do comparison operations on the entire array in one line:

In [125]:
arr

array([[2, 3, 8, 7],
       [2, 3, 2, 8],
       [5, 0, 8, 8]])

In [124]:
arr > 1

array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True, False,  True,  True]])

An `array` has many functions to manipulate and calculate statistics. Like the module, those functions can be access with the `.`

In [131]:
arr

array([[2, 3, 8, 7],
       [2, 3, 2, 8],
       [5, 0, 8, 8]])

Find the maximum value of the array

In [127]:
arr.max()

8

Find the maximum value of the array along the rows

In [130]:
arr.max(axis=1) # axis = 1 here means rows

array([8, 8, 8])

In [132]:
arr.max(axis=0) #axis = 0 here means columns

array([5, 3, 8, 8])

Calculates a cumulative sum of the array elements

In [133]:
arr = np.array([1,2,5,8,-6,1])
arr.cumsum()

array([ 1,  3,  8, 16, 10, 11])

In [49]:
arr.cumsum(axis=1)

array([[ 1,  3,  6, 14],
       [ 3,  5,  8, 10],
       [ 4,  9,  9, 17]])

### Indexing and slicing

The most important part is how to index and slice a `np.array`. It is actually very similar to `list`, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

#### 1 dimensional case

In [134]:
a1 = np.array([1,2,8,100])
a1

array([  1,   2,   8, 100])

In [51]:
a1[0]

1

In [136]:
a1[-2]

8

In [138]:
a1[0:3]

array([1, 2, 8])

In [141]:
a1[[0,2,3]]

array([  1,   8, 100])

We can also use boolean values to index
- `True` means we want this element

In [149]:
a1 > 3

array([False, False,  True,  True])

In [148]:
a1[a1>3]

array([  8, 100])

In [147]:
a1[[0,2,3]]

array([  1,   8, 100])

The function `np.where` returns the indices of all the elements that match a condition. For example

In [150]:
np.where(a1>3)

(array([2, 3]),)

where returns a list of arrays. Generally we are interested in the first one:

In [151]:
np.where(a1>3)[0]

array([2, 3])

In [155]:
people = np.array([[182,23],[155,22],[165,45],[182,23],[155,22],[165,45]])
people

array([[182,  23],
       [155,  22],
       [165,  45],
       [182,  23],
       [155,  22],
       [165,  45]])

In [164]:
people[people[:,1]>23]

array([[165,  45],
       [165,  45]])

In [167]:
people[0:3,:5]

array([[182,  23],
       [155,  22],
       [165,  45]])

this means the second and third elements of `a1` are larger than 3

#### 2 dimensional case

In [170]:
arr = people
arr

array([[182,  23],
       [155,  22],
       [165,  45],
       [182,  23],
       [155,  22],
       [165,  45]])

Using only one number to index will lead to a subset of the original multidimenional array: also an array

In [171]:
arr[0]

array([182,  23])

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [172]:
arr[0,0]

182

We can use `:` to indicate everything along that axis

In [173]:
arr[1]

array([155,  22])

In [174]:
arr[1, :]

array([155,  22])

In [175]:
arr[1,:] == arr[1]

array([ True,  True])

In [176]:
arr[:, 1]

array([23, 22, 45, 23, 22, 45])

## Generating random numbers

Numpy has a submodule called random that can generate random numbers

The following generates a 3 by 4 matrix of random numbers between 0 and 1 (uniform distribution)

In [188]:
np.random.random([2,3])

array([[0.01003552, 0.81251982, 0.62400983],
       [0.70551282, 0.17651142, 0.3261446 ]])

We can generate draw random variables from different distributions. For example a Gaussian with parameters 0 and 1 (this was noted $N(0,1)$ in class).

In [187]:
xt = np.random.normal(0, 1, size=[3,4])
xt

array([[ 0.34692631,  0.60383541, -0.58724542,  1.46870074],
       [-1.78331329,  1.95182965, -0.21099833, -0.30005356],
       [-0.45585934,  0.50812318,  0.67290501, -0.777097  ]])

Sample average and standard deviation can be computed as:

In [185]:
np.mean(xt) #Sample average

0.0010699630192597923

In [186]:
np.std(xt) # Sample standard deviation 

0.9997855956318246