# Libraries in General and Numpy in Particular

## 1. Importing Python Libraries

Part of the reason why Python is such a powerful tool for data science is that other people have written and optimized functions and wrapped them into **libraries** that we can bring into our own work.

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To use a package in your current workspace type `import` followed by the name of the library as shown below.

In [1]:
import numpy

That worked because numpy is [included with Anaconda](https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/), so numpy was installed when you installed Anaconda. Other packages will need to be installed before you can use them.

Let's try importing azure (which is a library for using MS Azure in Python).

In [79]:
import azure

That failed because azure is **not** included with Anaconda. In order to get this package, we'll have to install it first:

In [6]:
# I'll install this using PIP, the package installer for Python.
# This is a bash command, but I can run this in the notebook by
# prefixing the command with a '!' ("bang"):

!pip install azure

Now I can import the library:

In [5]:
import azure

Many packages have standard import aliases. We effect this aliasing by using the Python keyword `as`. For numpy, the standard alias is `np`.

In [80]:
import numpy as np

x = np.array([1,2,3])
print(x)
type(x)

Of course we could use any alias we like, including Python keywords! But if we did this, we'd overwrite the meaning of those keywords.

## 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [17]:
names_list = ['Bob', 'John', 'Sally']

#use numpy.array for numbers and numpy.char.array for strings

names_array = numpy.char.array(['Bob','John','Sally'])

print(names_list)
print(names_array)

In [29]:
# Make a list and an array of three numbers

#your code here
numbers_list =
numbers_array =

In [31]:
# multiply your array by 3



In [32]:
# multiply your list by 3



Numpy arrays support the _div_ operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array.

In [21]:
size_of_vec = 1000

X = range(size_of_vec)
Y = range(size_of_vec)

In [25]:
%timeit [X[i] + Y[i] for i in range(len(X))]

In [23]:
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

In [26]:
%timeit X + Y

## 3. What Else Can Numpy Do?

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [37]:
numbers_list.

The names of standard Python list attributes and methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [61]:
numbers_array.

Now there are many new options!

Exercise: Write down quick one-liners to describe what these methods do:


- `max()`
- `mean()`
- `min()`
- `round()`
- `std()`
- `sum()`

### Better Math Tools

#### Trigonometry:
- `np.pi` for $\pi$

In [81]:
np.pi

- `np.sin()` for the sine function

In [82]:
np.sin(np.pi / 6)

- `np.cos()` for the cosine function
- `np.tan()` for the tangent function
- `np.sinh()` for the hyperbolic sine function
- `np.cosh()` for the hyperbolic cosine function
- `np.tanh()` for the hyperbolic tangent function

#### Number Theory:
- `np.binary_repr()` to convert from decimal to binary

In [83]:
np.binary_repr(10)

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [84]:
np.diff([1, 4, 9, 16])

In [85]:
np.diff([1, 4, 9, 16], n=2)

- `np.gcd()` for the greatest common divisor

In [86]:
np.gcd(8, 100)

#### Array Logic:
- `np.bitwise_not()`
- `np.bitwise_and()`

In [87]:
np.bitwise_and([True, False, True], [False, True, True])

- `np.bitwise_or()`
- `np.bitwise_xor()`
- `np.concatenate()`

In [88]:
np.concatenate([[1, 2], [3, 4]])

#### Complex Numbers:
- `np.complex()`

In [89]:
np.complex(2, -3)

#### Data Analysis:
- `np.histogram()`

In [90]:
np.histogram([1, 2])

#### Logarithms:
- `np.exp()` for Euler's number with exponent

In [91]:
np.exp(2)

- `np.log()` for logarithms

In [92]:
np.log(10)

#### Linear Algebra:

`np.linalg` is an incredibly useful module for matrix mathematics, which we shall need in future lessons!

### More Tools

See [here](https://numpy.org/devdocs/user/basics.html) for more information about numpy. Let's go over some of these points:

#### [More numeric data types than base Python](https://numpy.org/devdocs/user/basics.types.html)

#### Intrinsic array constructors:

In [93]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10))

#### Multi-dimensional indexing:

In [103]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
nums.shape

In [105]:
nums[0, 2]

Why is this more efficient than `nums[0][2]`?

In [108]:
%timeit nums[0, 2]

In [109]:
%timeit nums[0][2]

#### Filtering:

In [94]:
data = np.array([10, 3, 4, 7, 6])

In [100]:
data[data < 5]

#### Broadcasting:

In [114]:
arr1 = np.array([-1, -2, -3])
arr2 = -8

In [116]:
arr1 + arr2

Two arrays can be broadcast together if their dimensions have *the same* value or if one of the dimensions has a value of *1*.

In [117]:
arr3 = np.array([[-10., 3., 175.2], [25., 1.47, 9.36]])
arr4 = np.array([5, 5, 5])

In [119]:
arr3 * arr4

#### np.nan and np.inf

NaN stands for "not a number". Numpy's np.nan is a handy way of representing these, in part because np.nan *is a float!*

In [128]:
type(np.nan)

This makes it convenient to perform mathematical operations on arrays that contain NaNs.

In [2]:
arr5 = np.array([1, 10, np.nan])

In [4]:
arr5.mean()

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [94]:
np.nansum(arr5) / len(arr5)

And this:

In [95]:
np.nanmean(arr5)

In [96]:
np.inf

In [97]:
np.isfinite(np.inf)

In [27]:
def inv(x):
    return x**(-2)

In [98]:
inv(0)

In [30]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [99]:
inverse(0)

## Exercises

1. Write down descriptions of what these methods do:

- `all()`
- `any()`
- `cumprod()`
- `cumsum()`
- `reshape()`
- `shape` --> No parentheses! This is an attribute and not a method.

2. Calculate the following using numpy:

- $\cos\left(\frac{\pi}{3}\right)$
- $\cosh\left(\frac{\pi}{3}\right)$

3. Write a function that will return the sum of the first $n$ terms of an array, where the user inputs both $n$ and the array.

4. Write a function that will return the logarithm of the standard deviation of an input array.

5. Use numpy to multiply the complex numbers $257 + 134i$ and $987 - 643i$.

6. **Euler's Formula.** How could we use numpy to test Euler's Formula: $e^{\pi i} + 1 = 0$?