# NumPy

## 1. Importing Python Libraries

Part of the reason why Python is such a powerful tool for data science is that other people have written and optimized functions and wrapped them into **libraries** that we can bring into our own work.

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To use a package in your current workspace type `import` followed by the name of the library as shown below.

In [None]:
import numpy

That worked because numpy is [included with Anaconda](https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/), so numpy was installed when you installed Anaconda. Other packages will need to be installed before you can use them.

Many packages have standard import aliases. We effect this aliasing by using the Python keyword `as`. For numpy, the standard alias is `np`.

In [None]:
import numpy as np

x = np.array([1, 2, 3])
print(x)
type(x)

Of course we could use any alias we like, including Python keywords! But if we did this, we'd overwrite the meaning of those keywords.

## 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [None]:
names_list = ['Bob', 'John', 'Sally']

# Use numpy.array for numbers and numpy.char.array for strings.

names_array = np.char.array(['Bob', 'John', 'Sally'])

print(names_list)
print(names_array)

In [None]:
# The character array has string-functionality that regular
# NumPy arrays don't have.

names_array.endswith('b')

In [None]:
# Let's make a list and an array of three numbers

numbers_list = [0, 5, 7]
numbers_array = np.array([0, 5, 7])

In [None]:
# multiply the array by 3

numbers_array * 3

In [None]:
# multiply the list by 3

numbers_list * 3

Unsurprisingly, `numpy` arrays also support the _div_ operator while python lists do not. There are other things that make it useful to utilize `numpy` over base Python for evaluating data.

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array.

In [None]:
size_of_vec = 1000

X = range(size_of_vec)
Y = range(size_of_vec)

In [None]:
%timeit [X[i] + Y[i] for i in range(len(X))]

In [None]:
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

In [None]:
%timeit X + Y

## 3. What Else Can Numpy Do?

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [None]:
numbers_list.

The names of standard Python list attributes and methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [None]:
numbers_array.

Now there are many new options!

### NumPy Array Attributes and Methods

- `all()` (returns True iff bool(element) == True for all element in the array)

In [None]:
numbers_array.all()

- `any()` (returns True iff bool(element) == True for some element in the array)

In [None]:
numbers_array.any()

- `cumprod()` (maps \[$x_0, x_1, ... , x_{n-1}$\] to \[$x_0, x_0\times x_1, ... , \Pi^{n-1}_{i=0}x_i$\])

In [None]:
np.array([1, 2, 3]).cumprod()

- `cumsum()` (maps \[$x_0, x_1, ... , x_{n-1}$\] to \[$x_0, x_0 + x_1, ..., \Sigma^{n-1}_{i=0}x_i$\])

In [None]:
numbers_array.cumsum()

- `max()` (return the greatest value in the array)

In [None]:
numbers_array.max()

- `mean()` (return the arithmetic mean of the array)

In [None]:
numbers_array.mean()

- `min()` (return the smallest value in the array)

In [None]:
numbers_array.min()

- `ravel()` (flatten an array)

In [None]:
new = np.array([[1, 2, 3], [4, 5, 6]])
new

In [None]:
new.ravel()

- `reshape()` (return the given array to a specified dimension)

In [None]:
new.reshape(3, 2)

- `round()` (round each entry in the array to a specified number of decimal places)

In [None]:
np.array([9.5, 1.2, 6.3]).round()

- `shape` (stores the dimension of the array)

In [None]:
numbers_array.shape

- `std()` (return the standard deviation of the array)

In [None]:
numbers_array.std()

- `sum()` (return the sum of the array's elements)

In [None]:
numbers_array.sum()

- `T` (stores the transpose of the array)

In [None]:
new.T

### Better Math Tools

#### Trigonometry:
- `np.pi` for $\pi$

In [None]:
np.pi

- `np.sin()` for the sine function

In [None]:
np.sin(np.pi / 6)

- `np.cos()` for the cosine function
- `np.tan()` for the tangent function
- `np.sinh()` for the hyperbolic sine function
- `np.cosh()` for the hyperbolic cosine function
- `np.tanh()` for the hyperbolic tangent function

#### Number Theory:
- `np.binary_repr()` to convert from decimal to binary

In [None]:
np.binary_repr(10)

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [None]:
np.diff([1, 4, 9, 16])

In [None]:
np.diff([1, 4, 9, 16], n=2)

- `np.gcd()` for the greatest common divisor

In [None]:
np.gcd(8, 100)

#### Array Logic:
- `np.bitwise_not()`
- `np.bitwise_and()`

In [None]:
np.bitwise_and([True, False, True], [False, True, True])

- `np.bitwise_or()`
- `np.bitwise_xor()`
- `np.concatenate()`

In [None]:
np.concatenate([[1, 2], [3, 4]])

#### Complex Numbers:
- `np.complex()`

In [None]:
np.complex(2, -3)

#### Data Analysis:
- `np.histogram()`

In [None]:
np.histogram([1, 2])

#### Logarithms:
- `np.exp()` for Euler's number with exponent

In [None]:
np.exp(2)

- `np.log()` for logarithms

In [None]:
np.log(10)

#### Linear Algebra:

`np.linalg` is an incredibly useful module for matrix mathematics, which we shall need in future lessons!

### More Tools

See [here](https://numpy.org/devdocs/user/basics.html) for more information about numpy. Let's go over some of these points:

#### [More numeric data types than base Python](https://numpy.org/devdocs/user/basics.types.html)

#### Intrinsic array constructors:

In [None]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10))

#### Multi-dimensional indexing:

In [None]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
nums.shape

In [None]:
nums[0, 2]

Why is this more efficient than `nums[0][2]`?

In [None]:
%timeit nums[0, 2]

In [None]:
%timeit nums[0][2]

#### Filtering:

In [None]:
data = np.array([10, 3, 4, 7, 6])

In [None]:
data[data < 5]

##### `np.where()`

In [None]:
np.where(data < 5, True, False)

##### `np.select()`

In [None]:
conditions = [data > 9, data % 2 == 1, data < 5]

choices = ['big!', 'not big but odd!', 'neither big nor odd but small!']

In [None]:
np.select(conditions, choices, default='other')

#### Broadcasting:

In [None]:
arr1 = np.array([-1, -2, -3])
arr2 = -8

In [None]:
arr1 + arr2

Two arrays can be broadcast together if their dimensions have *the same* value or if one of the dimensions has a value of *1*.

In [None]:
arr3 = np.array([[-10., 3., 175.2], [25., 1.47, 9.36]])
arr4 = np.array([5, 5, 5])

In [None]:
arr3 * arr4

#### np.nan and np.inf

NaN stands for "not a number". Numpy's np.nan is a handy way of representing these, in part because np.nan *is a float!*

In [None]:
type(np.nan)

This makes it convenient to perform mathematical operations on arrays that contain NaNs.

In [None]:
arr5 = np.array([1, 10, np.nan])

In [None]:
arr5.mean()

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [None]:
np.nansum(arr5) / len(arr5)

Is the right measure of the mean? Well, maybe. But if not, we also have this:

In [None]:
np.nanmean(arr5)

In [None]:
np.inf

In [None]:
np.isfinite(np.inf)

In [None]:
def inv(x):
    return x**(-2)

In [None]:
inv(0)

In [None]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [None]:
inverse(0)