# Libraries in General and Numpy in Particular

## 1. Importing Python Libraries

Part of the reason why Python is such a powerful tool for data science is that other people have written and optimized functions and wrapped them into **libraries** that we can bring into our own work.

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To use a package in your current workspace type `import` followed by the name of the library as shown below.

In [1]:
import numpy

That worked because numpy is [included with Anaconda](https://docs.anaconda.com/anaconda/packages/py3.7_osx-64/), so numpy was installed when you installed Anaconda. Other packages will need to be installed before you can use them.

Let's try importing azure (which is a library for using MS Azure in Python).

In [2]:
#import azure

ModuleNotFoundError: No module named 'azure'

That failed because azure is **not** included with Anaconda. In order to get this package, we'll have to install it first:

In [3]:
# I'll install this using PIP, the package installer for Python.
# This is a bash command, but I can run this in the notebook by
# prefixing the command with a '!' ("bang"):

!pip install azure

Collecting azure
  Downloading https://files.pythonhosted.org/packages/5a/5e/f94bd8091ab0ccc68ee902e188d4e8cd3b38dc77155a1c187cbf4767d0e4/azure-4.0.0-py2.py3-none-any.whl
Collecting azure-batch~=4.1 (from azure)
[?25l  Downloading https://files.pythonhosted.org/packages/b1/fa/1053b5dcd88e5de8e8cd70a4d7189ffad037542963ea86b518deb612c498/azure_batch-4.1.3-py2.py3-none-any.whl (314kB)
[K     |████████████████████████████████| 317kB 3.5MB/s eta 0:00:01
[?25hCollecting azure-storage-file~=1.3 (from azure)
  Downloading https://files.pythonhosted.org/packages/c9/33/6c611563412ffc409b2413ac50e3a063133ea235b86c137759774c77f3ad/azure_storage_file-1.4.0-py2.py3-none-any.whl
Collecting azure-datalake-store~=0.0.18 (from azure)
[?25l  Downloading https://files.pythonhosted.org/packages/27/9a/e7140775b3f8f011ef5d001c12a3519310094375671950105519e30bb12b/azure_datalake_store-0.0.48-py2.py3-none-any.whl (53kB)
[K     |████████████████████████████████| 61kB 15.7MB/s eta 0:00:01
[?25hCollecting az

[K     |████████████████████████████████| 112kB 2.7MB/s eta 0:00:01
[?25hCollecting azure-mgmt-devspaces~=0.1.0 (from azure-mgmt~=4.0->azure)
  Downloading https://files.pythonhosted.org/packages/f4/d3/ff9c39578f24fb4bd158a882788c7795b177cf3f23ce7cf1d8b52066a64e/azure_mgmt_devspaces-0.1.0-py2.py3-none-any.whl
Collecting azure-mgmt-advisor~=1.0 (from azure-mgmt~=4.0->azure)
  Downloading https://files.pythonhosted.org/packages/cb/f3/a86ba3e0784d12c8fe5cbf1f24e1b9255575a2f0892e08c46cddd0795dfd/azure_mgmt_advisor-1.0.1-py2.py3-none-any.whl
Collecting azure-mgmt-maps~=0.1.0 (from azure-mgmt~=4.0->azure)
  Downloading https://files.pythonhosted.org/packages/e4/04/c64326729e842f3eab1fd527f7582e269e4b0e5b9324a4562edaf0371953/azure_mgmt_maps-0.1.0-py2.py3-none-any.whl
Collecting azure-mgmt-batchai~=2.0 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/d9/a5/ab796c2a490155c14f9ac4240724ca5c56723315d4dc753030712e6f2e80/azure_mgmt_batchai-2.0.0-py2.py3-no

[K     |████████████████████████████████| 92kB 4.2MB/s eta 0:00:011
[?25hCollecting azure-mgmt-scheduler~=2.0 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/e8/55/f3490698ff622244438e667e0321808e84389e69c5fb77a1d6db869d9bb7/azure_mgmt_scheduler-2.0.0-py2.py3-none-any.whl (67kB)
[K     |████████████████████████████████| 71kB 20.7MB/s eta 0:00:01
[?25hCollecting azure-mgmt-devtestlabs~=2.2 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/2f/93/a64abaede2fc6a52476af8ceab9cedb368c49e948d9385cbe7cd4ce5ffff/azure_mgmt_devtestlabs-2.2.0-py2.py3-none-any.whl (194kB)
[K     |████████████████████████████████| 194kB 13.2MB/s eta 0:00:01
[?25hCollecting azure-mgmt-authorization~=0.50.0 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/6f/17/55b974603c16be89c7a7c16bac57b7bce48527bf1bebc3f116f7215176e6/azure_mgmt_authorization-0.50.0-py2.py3-none-any.whl (81kB)
[K     |█

[K     |████████████████████████████████| 563kB 11.3MB/s eta 0:00:01
[?25hCollecting azure-mgmt-cosmosdb~=0.4.1 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/17/ed/d97a04c8c26e2432f2cf000b711daeb053e1cfbdb046bd5d070c941b4fcb/azure_mgmt_cosmosdb-0.4.1-py2.py3-none-any.whl (100kB)
[K     |████████████████████████████████| 102kB 3.4MB/s ta 0:00:011
[?25hCollecting azure-mgmt-datafactory~=0.6.0 (from azure-mgmt~=4.0->azure)
[?25l  Downloading https://files.pythonhosted.org/packages/cf/01/32a6ad5ad348d965f7c106d819a1f6dc613f6aa98a720ffc529ef468016b/azure_mgmt_datafactory-0.6.0-py2.py3-none-any.whl (418kB)
[K     |████████████████████████████████| 419kB 9.3MB/s eta 0:00:01
Collecting PyJWT>=1.0.0 (from adal>=0.4.2->azure-datalake-store~=0.0.18->azure)
  Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl
Collecting isodate>=0.6.0 (from msrest>=

Now I can import the library:

In [4]:
import azure

Many packages have standard import aliases. We effect this aliasing by using the Python keyword `as`. For numpy, the standard alias is `np`.

In [6]:
import numpy as np

x = np.array([1,2,3])
print(x)
type(x)

[1 2 3]


numpy.ndarray

Of course we could use any alias we like, including Python keywords! But if we did this, we'd overwrite the meaning of those keywords.

## 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [7]:
names_list = ['Bob', 'John', 'Sally']

#use numpy.array for numbers and numpy.char.array for strings

names_array = numpy.char.array(['Bob','John','Sally'])

print(names_list)
print(names_array)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


In [11]:
# Make a list and an array of three numbers

#your code here


In [12]:
# multiply your array by 3


In [13]:
# multiply your list by 3



Numpy arrays support the _div_ operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array.

In [14]:
size_of_vec = 1000

X = range(size_of_vec)
Y = range(size_of_vec)

In [15]:
%timeit [X[i] + Y[i] for i in range(len(X))]

198 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [16]:
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

In [17]:
%timeit X + Y

882 ns ± 94.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## 3. What Else Can Numpy Do?

Type `numbers_list.` and then hit `TAB`. What options do you have?

In [21]:
numbers_list.

The names of standard Python list attributes and methods appear:

- `append(x)` (add x to the end of the list)
- `clear()` (delete all elements of the list)
- `copy()` (make a copy of the list)
- `count(x)` (return the number of instances of x in the list)
- `extend([x, y])` (add x and y to the end of the list)
- `index(x)` (return the position in the list of x)
- `insert(x, y)` (insert y into position x in the list)
- `pop(i=-1)` (remove and return the element at position i in the list)
- `remove(x)` (remove x from the list)
- `reverse()` (reverse the order of the elements of the list)
- `sort()` (sort the elements of the list)

Now type `numbers_array.` and then hit `TAB`. What options do you have?

In [24]:
numbers_array.

3

Now there are many new options!

Exercise: Write down quick one-liners to describe what these methods do:


- `max()`
- `mean()`
- `min()`
- `round()`
- `std()`
- `sum()`

### Better Math Tools

#### Trigonometry:
- `np.pi` for $\pi$

In [25]:
np.pi

3.141592653589793

- `np.sin()` for the sine function

In [26]:
np.sin(np.pi / 6)

0.49999999999999994

- `np.cos()` for the cosine function
- `np.tan()` for the tangent function
- `np.sinh()` for the hyperbolic sine function
- `np.cosh()` for the hyperbolic cosine function
- `np.tanh()` for the hyperbolic tangent function

#### Number Theory:
- `np.binary_repr()` to convert from decimal to binary

In [27]:
np.binary_repr(10)

'1010'

- `np.diff()` to calculate, recursively, the differences between sequence terms

In [28]:
np.diff([1, 4, 9, 16])

array([3, 5, 7])

In [29]:
np.diff([1, 4, 9, 16], n=2)

array([2, 2])

- `np.gcd()` for the greatest common divisor

In [30]:
np.gcd(8, 100)

4

#### Array Logic:
- `np.bitwise_not()`
- `np.bitwise_and()`

In [31]:
np.bitwise_and([True, False, True], [False, True, True])

array([False, False,  True])

- `np.bitwise_or()`
- `np.bitwise_xor()`
- `np.concatenate()`

In [32]:
np.concatenate([[1, 2], [3, 4]])

array([1, 2, 3, 4])

#### Complex Numbers:
- `np.complex()`

In [33]:
np.complex(2, -3)

(2-3j)

#### Data Analysis:
- `np.histogram()`

In [34]:
np.histogram([1, 2])

(array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
 array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ]))

#### Logarithms:
- `np.exp()` for Euler's number with exponent

In [35]:
np.exp(2)

7.38905609893065

- `np.log()` for logarithms

In [36]:
np.log(10)

2.302585092994046

#### Linear Algebra:

`np.linalg` is an incredibly useful module for matrix mathematics, which we shall need in future lessons!

### More Tools

See [here](https://numpy.org/devdocs/user/basics.html) for more information about numpy. Let's go over some of these points:

#### [More numeric data types than base Python](https://numpy.org/devdocs/user/basics.types.html)

#### Intrinsic array constructors:

In [37]:
print(np.zeros(10))
print(np.ones(10))
print(np.arange(10, dtype=float))
print(np.linspace(0.1, 1, 10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### Multi-dimensional indexing:

In [38]:
nums = np.array([[1, 2, 3], [4, 5, 6]])
nums.shape

(2, 3)

In [39]:
nums[0, 2]

3

Why is this more efficient than `nums[0][2]`?

In [40]:
%timeit nums[0, 2]

120 ns ± 2.75 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [41]:
%timeit nums[0][2]

234 ns ± 4.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### Filtering:

In [42]:
data = np.array([10, 3, 4, 7, 6])

In [43]:
data[data < 5]

array([3, 4])

#### Broadcasting:

In [44]:
arr1 = np.array([-1, -2, -3])
arr2 = -8

In [45]:
arr1 + arr2

array([ -9, -10, -11])

Two arrays can be broadcast together if their dimensions have *the same* value or if one of the dimensions has a value of *1*.

In [46]:
arr3 = np.array([[-10., 3., 175.2], [25., 1.47, 9.36]])
arr4 = np.array([5, 5, 5])

In [47]:
arr3 * arr4

array([[-50.  ,  15.  , 876.  ],
       [125.  ,   7.35,  46.8 ]])

#### np.nan and np.inf

NaN stands for "not a number". Numpy's np.nan is a handy way of representing these, in part because np.nan *is a float!*

In [48]:
type(np.nan)

float

This makes it convenient to perform mathematical operations on arrays that contain NaNs.

In [49]:
arr5 = np.array([1, 10, np.nan])

In [50]:
arr5.mean()

nan

Even though the array has a NaN, we don't get an error in calculating its mean. Moreover, we can do this:

In [51]:
np.nansum(arr5) / len(arr5)

3.6666666666666665

And this:

In [52]:
np.nanmean(arr5)

5.5

In [53]:
np.inf

inf

In [54]:
np.isfinite(np.inf)

False

In [55]:
def inv(x):
    return x**(-2)

In [56]:
inv(0)

ZeroDivisionError: 0.0 cannot be raised to a negative power

In [57]:
def inverse(x):
    if x == 0:
        val = np.inf
    else:
        val = x**(-2)
    return val

In [58]:
inverse(0)

inf

## Exercises

1. Write down descriptions of what these methods do:

- `all()`
- `any()`
- `cumprod()`
- `cumsum()`
- `reshape()`
- `shape` --> No parentheses! This is an attribute and not a method.

2. Calculate the following using numpy:

- $\cos\left(\frac{\pi}{3}\right)$
- $\cosh\left(\frac{\pi}{3}\right)$

3. Write a function that will return the sum of the first $n$ terms of an array, where the user inputs both $n$ and the array.

4. Write a function that will return the logarithm of the standard deviation of an input array.

5. Use numpy to multiply the complex numbers $257 + 134i$ and $987 - 643i$.

6. **Euler's Formula.** How could we use numpy to test Euler's Formula: $e^{\pi i} + 1 = 0$?