# Libraries in General and Numpy in Particular

## 1. Importing Python Libraries

Part of the reason why Python is such a powerful tool for data science is that other people have written and optimized functions and wrapped them into **libraries** that we can bring into our own work.

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To use a package in your current workspace type `import` followed by the name of the library as shown below.

In [2]:
import numpy

That worked because numpy is [included with Anaconda](https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/), so numpy was installed when you installed Anaconda. Other packages will need to be installed before you can use them.

Many packages have standard import aliases. We effect this aliasing by using the Python keyword `as`. For numpy, the standard alias is `np`.

In [3]:
import numpy as np

x = np.array([1,2,3])
print(x)
type(x)

[1 2 3]


numpy.ndarray

Of course we could use any alias we like, including Python keywords! But if we did this, we'd overwrite the meaning of those keywords.

## 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [4]:
names_list = ['Bob', 'John', 'Sally']

#use numpy.array for numbers and numpy.char.array for strings

names_array = numpy.char.array(['Bob','John','Sally'])

print(names_list)
print(names_array)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


In [5]:
# Make a list and an array of three numbers

#your code here
numbers_list =
numbers_array =

SyntaxError: invalid syntax (<ipython-input-5-15ae7e855c91>, line 4)

In [6]:
# multiply your array by 3



In [7]:
# multiply your list by 3



Numpy arrays support the _div_ operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array.

In [8]:
size_of_vec = 1000

X = range(size_of_vec)
Y = range(size_of_vec)

In [9]:
%timeit [X[i] + Y[i] for i in range(len(X))]

318 µs ± 4.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [10]:
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

In [11]:
%timeit X + Y

973 ns ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## 3. What Else Can Numpy Do?

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [12]:
numbers_list.

SyntaxError: invalid syntax (<ipython-input-12-d0684ed2a409>, line 1)

The names of standard Python list attributes and methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [None]:
numbers_array.

Now there are many new options!

Exercise: Write down quick one-liners to describe what these methods do:


- `max()`
- `mean()`
- `min()`
- `round()`
- `std()`
- `sum()`

### Better Math Tools

#### Trigonometry:
- `np.pi` for $\pi$

In [None]:
np.pi

- `np.sin()` for the sine function

In [None]:
np.sin(np.pi / 6)

- `np.cos()` for the cosine function
- `np.tan()` for the tangent function
- `np.sinh()` for the hyperbolic sine function
- `np.cosh()` for the hyperbolic cosine function
- `np.tanh()` for the hyperbolic tangent function

#### Number Theory:
- `np.binary_repr()` to convert from decimal to binary

In [None]:
np.binary_repr(10)

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [None]:
np.diff([1, 4, 9, 16])

In [None]:
np.diff([1, 4, 9, 16], n=2)

- `np.gcd()` for the greatest common divisor

In [None]:
np.gcd(8, 100)

#### Array Logic:
- `np.bitwise_not()`
- `np.bitwise_and()`

In [None]:
np.bitwise_and([True, False, True], [False, True, True])

- `np.bitwise_or()`
- `np.bitwise_xor()`
- `np.concatenate()`

In [None]:
np.concatenate([[1, 2], [3, 4]])

#### Complex Numbers:
- `np.complex()`

In [None]:
np.complex(2, -3)

#### Data Analysis:
- `np.histogram()`

In [13]:
np.histogram([1, 2])

(array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
 array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ]))

#### Logarithms:
- `np.exp()` for Euler's number with exponent

In [14]:
np.exp(2)

7.38905609893065

- `np.log()` for logarithms

In [15]:
np.log(10)

2.302585092994046

#### Linear Algebra:

`np.linalg` is an incredibly useful module for matrix mathematics, which we shall need in future lessons!

### More Tools

See [here](https://numpy.org/devdocs/user/basics.html) for more information about numpy. Let's go over some of these points:

#### [More numeric data types than base Python](https://numpy.org/devdocs/user/basics.types.html)

#### Intrinsic array constructors:

In [16]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10)) #i want 10 points between 0.1 and 1

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### Multi-dimensional indexing:

In [17]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
nums.shape

(2, 3)

In [18]:
nums[0, 2]

3

Why is this more efficient than `nums[0][2]`?

In [19]:
%timeit nums[0, 2]

128 ns ± 0.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [20]:
%timeit nums[0][2]

259 ns ± 6.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### Filtering:

In [21]:
data = np.array([10, 3, 4, 7, 6])

In [22]:
data[data < 5]

array([3, 4])

#### Broadcasting:

In [23]:
arr1 = np.array([-1, -2, -3])
arr2 = -8

In [24]:
arr1 + arr2

array([ -9, -10, -11])

Two arrays can be broadcast together if their dimensions have *the same* value or if one of the dimensions has a value of *1*.

In [25]:
arr3 = np.array([[-10., 3., 175.2], [25., 1.47, 9.36]])
arr4 = np.array([5, 5, 5])

In [26]:
arr3 * arr4

array([[-50.  ,  15.  , 876.  ],
       [125.  ,   7.35,  46.8 ]])

#### np.nan and np.inf

NaN stands for "not a number". Numpy's np.nan is a handy way of representing these, in part because np.nan *is a float!*

In [27]:
type(np.nan)

float

This makes it convenient to perform mathematical operations on arrays that contain NaNs.

In [28]:
arr5 = np.array([1, 10, np.nan])

In [29]:
arr5.mean()

nan

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [31]:
np.nansum(arr5) / len(arr5)

3.6666666666666665

Is the right measure of the mean? Well, maybe. But if not, we also have this:

In [34]:
np.nanmean(arr5)

5.5

In [36]:
np.info

<function numpy.info(object=None, maxwidth=76, output=<ipykernel.iostream.OutStream object at 0x10aaaf588>, toplevel='numpy')>

In [37]:
np.isfinite(np.inf)

False

In [38]:
def inv(x):
    return x**(-2)

In [39]:
inv(0)

ZeroDivisionError: 0.0 cannot be raised to a negative power

In [40]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [41]:
inverse(0)

inf

## Exercises

1. Write down descriptions of what these methods do:

- `all()` checks whether all elements of an array evaluate to True
- `any()` checks whether any element of an array evaluates to True
- `cumprod()` growing product of elements in array
- `cumsum()` growing sum of elements in array
- `reshape()` changes dimensions of array
- `shape` --> No parentheses! This is an attribute and not a method.

2. Calculate the following using numpy:

- $\cos\left(\frac{\pi}{3}\right)$
- $\cosh\left(\frac{\pi}{3}\right)$

In [42]:
np.cos(np.pi/3)

0.5000000000000001

In [43]:
np.cosh(np.pi/3)

1.600286857702386

3. Write a function that will return the sum of the first $n$ terms of an array, where the user inputs both $n$ and the array.

In [44]:
def first_n(n, arr):
    return np.cumsum(arr)[n-1]
    

4. Write a function that will return the logarithm of the standard deviation of an input array.

In [45]:
def log_std(arr):
    return np.log(np.std(arr))

5. Use numpy to multiply the complex numbers $257 + 134i$ and $987 - 643i$.

In [46]:
np.complex(257, 134) * np.complex(987, -643)

(339821-32993j)

6. **Euler's Formula.** How could we use numpy to test Euler's Formula: $e^{\pi i} + 1 = 0$?

In [47]:
np.allclose(np.exp(np.pi * np.complex(0, 1))+1, 0)

True