# Libraries in General and Numpy in Particular

## 1. Importing Python Libraries

Part of the reason why Python is such a powerful tool for data science is that other people have written and optimized functions and wrapped them into **libraries** that we can bring into our own work.

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To use a package in your current workspace type `import` followed by the name of the library as shown below.

In [1]:
import numpy

That worked because numpy is [included with Anaconda](https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/), so numpy was installed when you installed Anaconda. Other packages will need to be installed before you can use them.

Let's try importing azure (which is a library for using MS Azure in Python).

In [2]:
import azure

ModuleNotFoundError: No module named 'azure'

That failed because azure is **not** included with Anaconda. In order to get this package, we'll have to install it first:

In [4]:
# I'll install this using PIP, the package installer for Python.
# This is a bash command, but I can run this in the notebook by
# prefixing the command with a '!' ("bang"):

!pip install azure

Collecting azure
  Downloading azure-4.0.0-py2.py3-none-any.whl (2.2 kB)
Collecting azure-servicemanagement-legacy~=0.20.6
  Downloading azure_servicemanagement_legacy-0.20.6-py2.py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 5.5 MB/s eta 0:00:01
[?25hCollecting azure-storage-blob~=1.3
  Downloading azure_storage_blob-1.5.0-py2.py3-none-any.whl (75 kB)
[K     |████████████████████████████████| 75 kB 6.1 MB/s  eta 0:00:01
[?25hCollecting azure-datalake-store~=0.0.18
  Downloading azure_datalake_store-0.0.48-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.5 MB/s eta 0:00:011
[?25hCollecting azure-servicefabric~=6.3.0.0
  Downloading azure_servicefabric-6.3.0.0-py2.py3-none-any.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 11.1 MB/s eta 0:00:01
[?25hCollecting azure-servicebus~=0.21.1
  Downloading azure_servicebus-0.21.1-py2.py3-none-any.whl (36 kB)
Collecting azure-batch~=4.1
  Downloading azure_batch-4.1.3

Now I can import the library:

In [5]:
import azure

Many packages have standard import aliases. We effect this aliasing by using the Python keyword `as`. For numpy, the standard alias is `np`.

In [6]:
import numpy as np

x = np.array([1,2,3])
print(x)
type(x)

[1 2 3]


numpy.ndarray

Of course we could use any alias we like, including Python keywords! But if we did this, we'd overwrite the meaning of those keywords.

## 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [7]:
names_list = ['Bob', 'John', 'Sally']

#use numpy.array for numbers and numpy.char.array for strings

names_array = numpy.char.array(['Bob','John','Sally'])

print(names_list)
print(names_array)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


In [None]:
# Make a list and an array of three numbers

#your code here
numbers_list =
numbers_array =

In [None]:
# multiply your array by 3



In [None]:
# multiply your list by 3



Numpy arrays support the _div_ operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array.

In [8]:
size_of_vec = 1000

X = range(size_of_vec)
Y = range(size_of_vec)

In [9]:
%timeit [X[i] + Y[i] for i in range(len(X))]

307 µs ± 7.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [10]:
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

In [11]:
%timeit X + Y

979 ns ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## 3. What Else Can Numpy Do?

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [None]:
numbers_list.

The names of standard Python list attributes and methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [None]:
numbers_array.

Now there are many new options!

Exercise: Write down quick one-liners to describe what these methods do:


- `max()`
- `mean()`
- `min()`
- `round()`
- `std()`
- `sum()`

### Better Math Tools

#### Trigonometry:
- `np.pi` for $\pi$

In [12]:
np.pi

3.141592653589793

- `np.sin()` for the sine function

In [13]:
np.sin(np.pi / 6)

0.49999999999999994

- `np.cos()` for the cosine function
- `np.tan()` for the tangent function
- `np.sinh()` for the hyperbolic sine function
- `np.cosh()` for the hyperbolic cosine function
- `np.tanh()` for the hyperbolic tangent function

#### Number Theory:
- `np.binary_repr()` to convert from decimal to binary

In [14]:
np.binary_repr(10)

'1010'

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [15]:
np.diff([1, 4, 9, 16])

array([3, 5, 7])

In [16]:
np.diff([1, 4, 9, 16], n=2)

array([2, 2])

- `np.gcd()` for the greatest common divisor

In [17]:
np.gcd(8, 100)

4

#### Array Logic:
- `np.bitwise_not()`
- `np.bitwise_and()`

In [18]:
np.bitwise_and([True, False, True], [False, True, True])

array([False, False,  True])

- `np.bitwise_or()`
- `np.bitwise_xor()`
- `np.concatenate()`

In [19]:
np.concatenate([[1, 2], [3, 4]])

array([1, 2, 3, 4])

#### Complex Numbers:
- `np.complex()`

In [20]:
np.complex(2, -3)

(2-3j)

#### Data Analysis:
- `np.histogram()`

In [21]:
np.histogram([1, 2])

(array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
 array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ]))

#### Logarithms:
- `np.exp()` for Euler's number with exponent

In [22]:
np.exp(2)

7.38905609893065

- `np.log()` for logarithms

In [23]:
np.log(10)

2.302585092994046

#### Linear Algebra:

`np.linalg` is an incredibly useful module for matrix mathematics, which we shall need in future lessons!

### More Tools

See [here](https://numpy.org/devdocs/user/basics.html) for more information about numpy. Let's go over some of these points:

#### [More numeric data types than base Python](https://numpy.org/devdocs/user/basics.types.html)

#### Intrinsic array constructors:

In [24]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### Multi-dimensional indexing:

In [25]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
nums.shape

(2, 3)

In [26]:
nums[0, 2]

3

Why is this more efficient than `nums[0][2]`?

In [27]:
%timeit nums[0, 2]

171 ns ± 1.06 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [28]:
%timeit nums[0][2]

345 ns ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### Filtering:

In [29]:
data = np.array([10, 3, 4, 7, 6])

In [30]:
data[data < 5]

array([3, 4])

#### Broadcasting:

In [31]:
arr1 = np.array([-1, -2, -3])
arr2 = -8

In [32]:
arr1 + arr2

array([ -9, -10, -11])

Two arrays can be broadcast together if their dimensions have *the same* value or if one of the dimensions has a value of *1*.

In [33]:
arr3 = np.array([[-10., 3., 175.2], [25., 1.47, 9.36]])
arr4 = np.array([5, 5, 5])

In [34]:
arr3 * arr4

array([[-50.  ,  15.  , 876.  ],
       [125.  ,   7.35,  46.8 ]])

#### np.nan and np.inf

NaN stands for "not a number". Numpy's np.nan is a handy way of representing these, in part because np.nan *is a float!*

In [35]:
type(np.nan)

float

This makes it convenient to perform mathematical operations on arrays that contain NaNs.

In [36]:
arr5 = np.array([1, 10, np.nan])

In [37]:
arr5.mean()

nan

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [38]:
np.nansum(arr5) / len(arr5)

3.6666666666666665

And this:

In [39]:
np.nanmean(arr5)

5.5

In [40]:
np.inf

inf

In [41]:
np.isfinite(np.inf)

False

In [42]:
def inv(x):
    return x**(-2)

In [43]:
inv(0)

ZeroDivisionError: 0.0 cannot be raised to a negative power

In [44]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [45]:
inverse(0)

inf

## Exercises

1. Write down descriptions of what these methods do:

- `all()`
- `any()`
- `cumprod()`
- `cumsum()`
- `reshape()`
- `shape` --> No parentheses! This is an attribute and not a method.

2. Calculate the following using numpy:

- $\cos\left(\frac{\pi}{3}\right)$
- $\cosh\left(\frac{\pi}{3}\right)$

3. Write a function that will return the sum of the first $n$ terms of an array, where the user inputs both $n$ and the array.

4. Write a function that will return the logarithm of the standard deviation of an input array.

5. Use numpy to multiply the complex numbers $257 + 134i$ and $987 - 643i$.

6. **Euler's Formula.** How could we use numpy to test Euler's Formula: $e^{\pi i} + 1 = 0$?