# Section 04: Introducing Libraries: NumPy

- online-ds-ft-070620
- 07/15/20

![libgif](https://media0.giphy.com/media/7E8lI6TkLrvvAcPXso/giphy.gif?cid=790b76115d360a95792e4333770609b8&rid=giphy.gif)

# Resources

- This Repository:  (to clone for activity)
    - https://github.com/jirvingphd/dtsc-ft-070620-sect-04-python-libraries-study-group
    - Check the Solution branch for the solution to the task
    

## QUESTIONS?

- Importing Data Using Pandas - Lab, Level-Up - What is the rationale behind the solution?
- When making graphs when do you use plt.title(‘ ‘)  in comparison to ax.set_title(‘ ‘)?


## Side Bar: Ways to Improve Your Set Up (nbextensions & VS Code)

#### Jupyter Notebook Extensions
- For full details and recommended extensions & settings see our [notebook from section 01](https://github.com/jirvingphd/fsds_070620_FT_cohort_notes/blob/master/Mod_1/sect_01/sect_01_getting_started.ipynb)

- In your Terminal enter the following 3 commands (one at a time):

```bash
conda activate learn-env
conda install -c conda-forge jupyter_contrib_nbextensions
jupyter nbextension enable jupyter_nbextensions_configurator

```

- If your computer gives an error about conda-forge channel then go to [the packages' documentation](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html) for the alternative installation instructions using `pip` 
  

#### VS Code

- Download VS Code installer: 
https://code.visualstudio.com/
- "Using Python Environments

# Introduction

#### _Our goals today are to be able to_: <br/>

- Identify and import Python modules and packages (libraries)
- Identify differences between NumPy and base Python in usage and operation
- Create a new module of our own

#### _Big questions for this lesson_: <br/>
- What is a package, what do packages do, and why might we want to use them?
- When do we want to use NumPy?

### Activation:

![excel2](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/excelpic2.jpg)

Most people have used Microsoft Excel or Google sheets. But what are the limitations of excel?

- [Take a minute to read this article](https://www.bbc.com/news/magazine-22223190)
- make a list of problems excel presents

- **Q: How is using python different?**

- **A: Python...**
    -  

## 1. Importing Python packages 


In an earlier lesson, we wrote a function to calculate the mean of an list. That was **tedious**. To make our code efficient we could store that function in a *python module* and call it later when we need it. 

And thankfully, other people have _also_ wrote and optimized functions and wrapped them into **modules** and **packages** (also known as _libraries_ )

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To import a package type `import` followed by the name of the library as shown below.


### Terminology

![mod2](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/modules2.png)

![packages3](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/packages3.png)

![python-fact](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/python_def.png)

### pip & the Python Package Index

![pypi](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/pypi_packages.png)

### You're not limited to PyPI

Make your own modules
![pipmod](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/import_modules.png)

![pippack](https://raw.githubusercontent.com/jirvingphd/dsc-lp-libraries-numpy/master/img/package_redo.png)

### First library we will import is `Numpy`


![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


To import a package type `import` followed by the name of the package as shown below.

In [1]:
import numpy # Look, ma! we're importing!
l = [1,2,3]
x=numpy.array([1,2,3])
display(x)

array([1, 2, 3])

#### New type of object

In [2]:
type(x)

numpy.ndarray

#### Alias libraries

Many packages have a canonical way to import them with an abbreviated alias.

In [3]:
import numpy as np # np = alias


y=np.array([4,5,6])
print(y)

[4 5 6]


#### Other standard aliases 

In [4]:
import scipy
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import statsmodels as sm

In [5]:
mpl.pyplot.figure()

<Figure size 432x288 with 0 Axes>

<Figure size 432x288 with 0 Axes>

### Import specific modules from a larger package

In [6]:
# sometimes we will want to import a specific module from a package
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt 

What happens if we mess with naming conventions? For example, import one of our previous libraries as `print`.


**PLEASE NOTE THAT WE WILL HAVE TO RESET THE KERNEL AFTER RUNNING THIS.**<br> Comment out your code after running it.


In [7]:
# import seaborn as print

In [8]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [9]:
#Did we get an error? What about when we run the following command?

print(x)

#Restart your kernel and clear cells

[1 2 3]


#### Helpful links: package documentation

Packages have associated documentation to explain how to use the different tools included in a package.

_Sample of libraries_
- [NumPy](https://docs.scipy.org/doc/numpy/)
- [SciPy](https://docs.scipy.org/doc/scipy/reference/)
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/)
- [Matplotlib](https://matplotlib.org/contents.html)

## 2. NumPy versus base Python

Now that we know packages exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called **arrays**.

Numpy has a few advantages over base Python which we will look at.

### Numpy makes math easy

Because of numpy we can now get the **mean** and other quick math of lists and arrays.

In [10]:
example = [4,3,25,40,62,20]
print(np.mean(example))

25.666666666666668


#### Different types of arrays

In [11]:
names_list=['Bob','John','Sally']
names_array=numpy.char.array(['Bob','John','Sally']) #use numpy.array for numbers and numpy.char.array for strings
print(names_list)
print(names_array)

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


#### Array math in action

In [12]:
# Make a list and an array of three numbers

#your code here
numbers_list = [5,22,33,90]
numbers_array = np.array([5,22,33,90])

In [13]:
# divide your array by 2

numbers_array/2

array([ 2.5, 11. , 16.5, 45. ])

In [14]:
# divide your list by 2

numbers_list/2

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Numpy arrays support the `div()` operator while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

In [16]:
numbers_array

array([ 5, 22, 33, 90])

In [17]:
# shape tells us the size of the array

numbers_array.shape

(4,)

### Numpy array and matrix creation functions

Take 5 minutes and explore each of the following functions.  What does each one do?  What is the syntax of each?
- `np.zeros()`
- `np.ones()`
- `np.full()`
- `np.eye()`
- `np.random.random()`

In [18]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [19]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [20]:
np.full((3,3),3.3)


array([[3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3],
       [3.3, 3.3, 3.3]])

In [21]:
np.eye(6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]])

In [25]:
np.random.choice([1,23,45,76])

23

In [27]:
np.random.random((3,3))

array([[0.95031588, 0.96172772, 0.66996547],
       [0.07656176, 0.24561427, 0.41533608],
       [0.65078026, 0.72037783, 0.7108242 ]])

### Slicing in NumPy

In [28]:
# We remember slicing from lists
numbers_list = list(range(10))
numbers_list[3:7]

[3, 4, 5, 6]

In [29]:
# Slicing in NumPy Arrays is very similar!
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [30]:
a.shape

(3, 4)

```python
a[row,col]
```

In [31]:
# first 2 rows, columns 1 & 2 (remember 0-index!)
b = a[:2, 1:3]
b

array([[2, 3],
       [6, 7]])

### Datatypes in NumPy


In [34]:
type(a)

numpy.ndarray

In [32]:
a.dtype

dtype('int64')

In [35]:
names_list.dtype

AttributeError: 'list' object has no attribute 'dtype'

In [36]:
a.astype(np.float64)

array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [ 9., 10., 11., 12.]])

### More Array Math 
#### Adding matrices 

In [38]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

display(x,y)
# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)

array([[1., 2.],
       [3., 4.]])

array([[5., 6.],
       [7., 8.]])

[[ 6.  8.]
 [10. 12.]]


In [39]:
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]


#### Subtracting matrices 

In [40]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)

[[-4. -4.]
 [-4. -4.]]


In [41]:
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]


#### Multiplying matrices 

In [42]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)

[[ 5. 12.]
 [21. 32.]]


In [43]:
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]


#### Dividing matrices 

In [44]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [45]:
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]


#### Raising matrices to powers 

In [46]:
# Elementwise square root; both produce the same array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(x ** (1/2))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [47]:
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


### Numpy is faster

Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array. In this speed test, we will use the package [time](https://docs.python.org/3/library/time.html).

In [48]:
import time
import numpy as np

size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X))]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("python: " + str(t1), "numpy: "+ str(t2))
print("Numpy is in this example " + str(t1/t2) + " times faster!")

python: 0.0002601146697998047 numpy: 3.886222839355469e-05
Numpy is in this example 6.693251533742331 times faster!


# ACTIVITY: Pair Programming 

In pairs, run the speed test with a different number, and share your results with the class.

## 3. Making our own module
![modlife](https://media1.giphy.com/media/dW0KhIROCaAdCO0V3S/giphy.gif?cid=790b76115d36096678416c65519d8082&rid=giphy.gif)

In [1]:
# this option will re-import your module each time you save an update to it
%load_ext autoreload
%autoreload 2

In [2]:
import temperizer as tp

## Example: Convert F to C

1. This function is already implemented in `temperizer.py`.
2. Notice that we can call the imported function and see the result.

In [3]:
# 32F should equal 0C
tp.convert_f_to_c(32)

0.0

In [4]:
# -40F should equal -40C
tp.convert_f_to_c(-40)

-40.0

In [5]:
# 212F should equal 100C
tp.convert_f_to_c(212)

100.0

## Your turn: Convert C to F

1. Find the stub function in `temperizer.py`
2. The word `pass` means "this space intentionally left blank."
3. Add your code _in place of_ the `pass` keyword, _below_ the docstring.
4. Run these cells and make sure that your code works.

In [None]:
# 0C should equal 32F
tp.convert_c_to_f(0)

In [None]:
# -40C should equal -40F
tp.convert_c_to_f(-40)

In [None]:
# 100C should equal 212F
tp.convert_c_to_f(100)

## Next: Adding New Functions

You need to add support for Kelvin to the `temperizer` library.

1. Create new _stub functions_ in `temperizer.py`:

    * `convert_c_to_k`
    * `convert_f_to_k`
    * `convert_k_to_c`
    * `convert_k_to_f`

    Start each function with a docstring and the `pass` keyword, e.g.:

    ```python
    def convert_f_to_k(temperature_f):
        """Convert Fahrenheit to Kelvin."""
        pass
    ```

2. Add cells to this notebook to test and validate these functions, similar to the ones above.

3. Then, go back to `temperizer.py` to replace `pass` with your code.

4. Run the notebook cells to make sure that your new functions work.

#### Small note:

Docstrings (those lines with `""" """` on either side) allow us to create self-documented code. 

Now later, if you forget what each function does, you can use the `?` or `help` functions the same way you would with other functions, and your documentation will show up!

In [None]:
tp.convert_f_to_c()

### Extra credit:

make a function in your temperizer that will take a temp in F, and print out:

```
The temperature [number] F is:
    - x in C
    - y in k
```

In [None]:
tp.convert_f_to_all(89)

## Level-Up (Optional)

- Make a version of tp.convert_f_to_all that returns a pandas DataFrame with the results instead of print statements.

In [None]:
tp.convert_f_to_all_level_up(89)

## Congrats!!

You've now made your own module of temperature conversion functions!

#### _Our goals today were to be able to_: <br/>

- Identify and import Python modules and packages (libraries)
- Identify differences between NumPy and base Python in usage and operation
- Create a new module of our own