# NumPy

Numpy is a powerful package for fast array/matrix computing. It is the fundation for other scientific computing and data analytics packages in Python. 

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transformation, and random number capabilities

We only cover NumPy topics frequently used in analytics. See <a href="https://www.numpy.org/devdocs/user/quickstart.html">the official NumPy tutorial at numpy.org</a> for a complete guide.

## Basics

### Import the NumPy package
To use NumPy, you need to first import the NumPy package. It is a well-accepted convention to use "np" as an alias for this package:

In [None]:
import numpy as np

### Create ndarray object
At the core of NumPy is the `ndarray` object, an N-dimensional array object of homogeneous data types. `ndarray` is also called `array` as the latter is an alias of the former in NumPy.

We can convert Python Lists to ndarray object using `numpy.array()`:

In [None]:
# create ndarray from list
x = [1, 2, 3, 4]
arr1 = np.array(x)

In [None]:
type(arr1)

In [None]:
type(x)

Exercise: Create ndarray from list [[1,2,3,4], [5, 6, 7, 8]], and print it.

We can also create ndarray objects using built-in NumPy functions: `arange`, `ones`, `zeros`, `empty`, `eye`

In [None]:
# arange() for ndarray is similar to range() for list
np.arange(2, 10, 2)

In [None]:
np.ones(4)

In [None]:
np.ones([2, 3])
# This can also be written as np.ones((2, 3))

### Exercises

Create 3x4 array with all ones

Create 4x4 array of all zeros

Try `np.empty(5)` and `np.empty(10)`, and observe the outputs. It is a good habit to google any function that is not obvious to you when trying it.
+ Or, use the Tab key for quick help.

### The properties of ndarray object
check array **properties** `.ndim`, `.shape`, `.dtype`

In [None]:
x = np.array([[ 0,  1,  2,  3,  4],
              [ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14]])

Find out dimension of x

In [None]:
x.ndim

Find out shape of x

Find out data type of x

Notice that `type(x)` and `x.dtype` serve different purposes!

### Change data types for ndarrays

In [None]:
x = np.array([1.3, 2.4, 3.5])

In [None]:
x.dtype

There are two equivalent ways to change data type of ndarray object:

In [None]:
# option 1
y = np.array(x, dtype='int')
y.dtype

In [None]:
# option 2
y = x.astype(int)
y

## Operations between arrays

### Two arrays of the same shape
Any arithmetic operations between equal-size arrays applies the operation *element-wise* 
- (Not required) This is often called **vectorization**. Modern CPUs/GPUs are often designed for fast vectorization operations

In [None]:
x = np.ones([4, 5])
x

In [None]:
# ndarray.reshape() is a popularly used method to change the shape of an array.
y = np.arange(20).reshape((4, 5))
y

In [None]:
x + y

In [None]:
x * y

### Array and Scalar
Arithmetic operations with scalars propagate the value to each element

In [None]:
# recall what y is:
y

Exercise: What is the result of y + 1?

The above is a simple example of an important concept called **broadcasting** in NumPy. Simply put, broadcasting automatically changes the shape of a smaller array to match that of a larger array, so an arithmetic operation will make sense. 
- Used with caution, broadcasting can significantly simply our coding.
- (Not required) Just in case you are interested in learning more, see (https://numpy.org/devdocs/user/theory.broadcasting.html) for details.

## Indexing and Slicing
Similar to slicing in Python Lists, and extends intuitively to n-dimension.

In [None]:
y

In [None]:
y[0, 0]

In [None]:
y[3, 4]

In [None]:
y[-1, -1]

In [None]:
y[0, :]

In [None]:
y[:, -1]

In [None]:
y[0:2, 0:2]

Exercise: Select the first three rows of y

Exercise: Select the first and third rows of y

**An important difference between Numpy `array` and Python `list`**: array slices are views on the original array. Any modification to the view will be reflected in the source array 

In [None]:
y

In [None]:
y[0:2, :] = 0

In [None]:
y

**If you want to avoid change to the original array, make a copy.** Any change to the copy won't affect the original array.

In [None]:
y = np.arange(20).reshape((4, 5))
x = y.copy()
y

In [None]:
x[0:2, :] = 1

In [None]:
x

In [None]:
y

### Boolean Indexing

In [None]:
z = x - 12
z

Sometimes we want to modify an array based on some logic operation. Below are two popular examples.

First, suppose we want to extract all nonnegative elements from z:

In [None]:
nonnegative_elements_of_z = z[z>=0]
nonnegative_elements_of_z

Second, suppose we want to set all negative elements in z to 0. To do so, we first create an array of boolean values:

In [None]:
check_if_below_0 = z < 0
check_if_below_0

Assign all negative elements in x to 0

In [None]:
z[check_if_below_0] = 0
z

In [None]:
# the above can also be done with the following simpler code:
# z[z<0] = 0

## How to represent missing values and infinite?
Missing values can be represented using `np.nan` object, while `np.inf` represents infinite.

In [None]:
x = np.ones((3, 4))
x

In [None]:
x[0, 0] = np.nan
x

In [None]:
x + 1

## Functions and Statistics

### Universal Functions `ufuncs`: Fast Element-wise Array Functions

Many commonly used math functions, e.g., sqrt(), max(), are supported by NumPy through universal functions (a.k.a., ufuncs). Basically, NumPy simply apply these functions *element-wise* through broadcasting. Let's look at a few examples. For more details, see (https://docs.scipy.org/doc/numpy/reference/ufuncs.html).

In [None]:
x = np.random.randint(0, 11, (4, 3))
x

Unary unfuncs, e.g. `abs()`, `sqrt()` that take one argument and perform elementwise transformations 

In [None]:
np.sqrt(x)

Binary unfuncs, e.g. 'subtract()' that take two arguments

In [None]:
np.subtract(x, np.ones((4, 3)))

In [None]:
# alternatively, we can write
x - np.ones((4, 3))

In [None]:
# alternatively, we can write
x-1.

One more example:

In [None]:
np.maximum(x, np.ones((4, 3))*5)

### User `lambda` function to perform complex element-wise array operations

In [None]:
# suppose we want to normalize x and center it around 0
f = lambda e: (e-5)/10

In [None]:
f(x)

### Statistics
NumPy statstics include `median`, `average`, `mean`, `amax`, `amin` and many others

<a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html"> Reference </a>

In [None]:
x = np.random.randint(0, 11, (4, 5))
x

In [None]:
np.median(x, axis=0)

In [None]:
np.median(x, axis=1)

In [None]:
np.mean(x, axis=0)

In [None]:
np.mean(x, axis=1)

In [None]:
np.min(x, axis=0)

### Exercises

Create a random integer array `x` of shape (4, 3) with values between 0 and 10 (inclusive of both 0 and 10).

calcualte mean for each column

Subtract the mean of each column of the matrix

Change elements of x that are less than 5, to 0