![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. Many other data science packages, especially those that work with matrices, rely on it for its speed and utility.

# Objectives

- Use NumPy to create arrays and perform efficient operations with them
- Use NumPy's other mathematical tools relevant to data analysis

For numpy, the standard alias is `np`.

In [None]:
import numpy as np

# NumPy Arrays

Python lists and NumPy arrays can both hold numbers. However, Python lists have limited functionality for mathematical operations. NumPy arrays make it easy and fast to do math with a collection of numbers.

In [None]:
x = np.array([1, 2, 3])
print(x)
print(type(x))

Note that there is an [`array` class in base Python](https://docs.python.org/3/library/array.html), but we will not be using it. It is essentially a list constrained to one type (e.g. int).

In [None]:
import array
x = array.array('i',[1,2,3])
print(x)
print(type(x))

## Character Arrays

NumPy arrays can either hold strings or numbers, not both. There is a special `chararray` object you can use with strings.

In [None]:
names_list = ['Bob', 'John', 'Sally']

# Use numpy.array for numbers and numpy.char.array for strings.

names_array = np.char.array(['Bob', 'John', 'Sally'])

print(names_list)
print(names_array)
type(names_array)

In [None]:
# The character array has string-functionality that numeric
# arrays don't have.

names_array.endswith('b')

## Numeric Arrays

Let's make a list and an array of three numbers, and see how they function differently

In [None]:
numbers_list = [0, 5, 7]
numbers_array = np.array([0, 5, 7])

### Type

Numeric arrays are have the `ndarray` type, which is short for N-Dimensional Array.

In [None]:
print(type(numbers_list))
print(type(numbers_array))

## Arithmetic Operations

Arithmetic operators (e.g. +, -, * and /) work according to mathematical principles for arrays, unlike with lists. These operations are done "element-wise".

In [None]:
# multiply the array by 3

numbers_array * 3

In [None]:
# multiply the list by 3

numbers_list * 3

In [None]:
numbers_array + 20

In [None]:
numbers_list + 20

### Speed

Below, you will find a piece of code we will use to compare the speed of operations on lists vs arrays.

In [None]:
size_of_vec = 1000

X = list(range(size_of_vec))
Y = list(range(size_of_vec))

In [None]:
%timeit [X[i] + Y[i] for i in range(size_of_vec)]

In [None]:
X = np.array(range(size_of_vec))
Y = np.array(range(size_of_vec))

In [None]:
%timeit X + Y

## Array Attributes and Methods

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [None]:
numbers_list.

The names of standard Python list methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [None]:
numbers_array.

Turns out, there are a _bunch_ of new tools!

### Numeric Methods

- `max()` (return the greatest value in the array)

In [None]:
numbers_array.max()

- `mean()` (return the arithmetic mean of the array)

In [None]:
numbers_array.mean()

- `min()` (return the smallest value in the array)

In [None]:
numbers_array.min()

- `round()` (round each entry in the array to a specified number of decimal places)

In [None]:
np.array([9.5, 1.2, 6.3]).round()

- `std()` (return the standard deviation of the array)

In [None]:
numbers_array.std()

- `sum()` (return the sum of the array's elements)

In [None]:
numbers_array.sum()

### Boolean Methods

- `all()` (returns True iff bool(element) == True for all elements in the array)

In [None]:
numbers_array.all()

- `any()` (returns True iff bool(element) == True for some element in the array)

In [None]:
numbers_array.any()

# Multi-Dimensional Indexing

Arrays are especially powerful when dealing with numbers in multiple dimensions - for example, a datset with rows and columns. We will primarily work with such 2-dimensional arrays.

In [None]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
print(nums)

In [None]:
print(nums.shape)

In [None]:
nums[0, 2]

This more efficient than `nums[0][2]`. Why?

In [None]:
%timeit nums[0, 2]

In [None]:
%timeit nums[0][2]

### Slicing

Use a trailing comma or colon to [slice](https://numpy.org/doc/stable/reference/arrays.indexing.html) sections of an array

In [None]:
nums[1]

In [None]:
nums[1,]

In [None]:
nums[1,:]

In [None]:
nums[:,1]

### Reshaping

In [None]:
new = np.array([[1, 2, 3], [4, 5, 6]])
new

- `shape` (stores the dimension of the array)

In [None]:
new.shape

- `ravel()` (reduce array to one dimension)

In [None]:
new.ravel()

- `reshape()` (return an array with the specified dimensions)

In [None]:
new.reshape(3, 2)

- `T` (stores the transpose of the array)

In [None]:
new.T

# NumPy Functions

NumPy has a bunch of functions, besides array methods, that are helpful for working with arrays.

## Array Constructors

In [None]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10))

### `np.concatenate()`

In [None]:
np.concatenate([[1, 2], [3, 4]])

## Filtering

In [None]:
data = np.array([10, 3, 4, 7, 6])

In [None]:
data < 5

In [None]:
data[data < 5]

### `np.where()`

In [None]:
np.where(data < 5, "Low", "High")

### `np.select()`

In [None]:
conditions = [data < 5, data > 9]

choices = ['small', 'big!']

In [None]:
np.select(conditions, choices, default='other')

## Broadcasting

Two arrays can be combined with mathematical operations via [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html).  

From the [docs](https://numpy.org/doc/stable/user/basics.broadcasting.html):

    When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when
    - they are equal, or
    - one of them is 1
    
Let's try to figure out what will happen in the operations below

In [None]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([10, 100, 1000])
print(arr1)
print(arr2)

In [None]:
print(arr1.shape)
print(arr2.shape)

In [None]:
arr1 * arr2

In [None]:
arr3 = np.array([10, 100])
arr3.shape

In [None]:
arr1 * arr3

In [None]:
arr4 = np.array([[10, 100]])
arr4.shape

In [None]:
arr1 * arr4

In [None]:
arr5 = np.array([[10],[100]])
arr5.shape

In [None]:
arr1 * arr5

# Other NumPy Tools

NumPy comes with an assortment of mathematical tools that can come in handy, separate from arrays.

## Trigonometry 

- `np.pi` for $\pi$

In [None]:
np.pi

- `np.sin()` for the sine function

In [None]:
np.sin(np.pi / 6)

## Sequences


- `np.cumsum()` to calculate, recursively, the sum of sequence terms

In [None]:
np.cumsum([1, 4, 9, 16])

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [None]:
np.diff([1, 4, 9, 16])

## Logarithms
- `np.exp()` for Euler's number with exponent

In [None]:
np.exp(2)

- `np.log()` for logarithms

In [None]:
np.log(10)

## np.nan

NaN stands for "not a number" - NumPy's `nan` class is a handy way of representing these.  These are very useful for representing missing data.

Since `np.nan` is is a float, we don't get errors when doing operations on arrays that have missing data. 

In [None]:
type(np.nan)

In [None]:
arr5 = np.array([1, 10, np.nan])

In [None]:
arr5.mean()

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [None]:
np.nansum(arr5) / len(arr5)

Is the right measure of the mean? Well, maybe. But if not, we also have this:

In [None]:
np.nanmean(arr5)

## np.inf

Sometimes you will end up with values of infinity represented by `np.inf`, such as if you divide by zero. `np.inf` is a float, like `np.nan` is.

In [None]:
np.array([2])/np.array([0])

In [None]:
type(np.inf)

In [None]:
np.isfinite(np.inf)

This can also be useful when handling edge cases in custom functions

In [None]:
def inv(x):
    return x**(-2)

In [None]:
inv(0)

In [None]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [None]:
inverse(0)