# Module 1 - Introducing Modules and NumPy

### Introduction

#### *Our goals today are to be able to*:  

- Identify and import Python modules
- Using the Python Standard Library
- Install new modules if we need them
- Identify differences between NumPy and base Python in usage and operation

#### *Big questions for this lesson*:  
- What is a package, what do packages do, and why might we want to use them?
- When do we want to use NumPy?

### 1. Importing Python Libraries


Previously, we wrote a function to calculate the mean of an list. That was tedious.

Thankfully, other people have wrote and optimized functions and wrapped them into **modules and packages** we can then call and use in our analysis.

To import a package type `import` followed by the name of the library as shown below, or use `from` and `import` to import specific objects

In [1]:
import math
from collections import Counter

In [2]:
type(math)

module

In [3]:
type(Counter)

type

In [4]:
# os & sys
import os
import sys

In [5]:
os.getcwd()

'C:\\Users\\joshu\\code\\hbs\\hbs-ds-060120\\module-1\\day-4-libraries-numpy'

In [6]:
os.environ

environ{'ACLOCAL_PATH': 'C:\\Program Files\\Git\\mingw64\\share\\aclocal;C:\\Program Files\\Git\\usr\\share\\aclocal',
        'ALLUSERSPROFILE': 'C:\\ProgramData',
        'APPDATA': 'C:\\Users\\joshu\\AppData\\Roaming',
        'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files',
        'COMPUTERNAME': 'JOSH-LENOVO',
        'COMSPEC': 'C:\\WINDOWS\\system32\\cmd.exe',
        'CONDA_DEFAULT_ENV': 'base',
        'CONDA_EXE': 'C:/Users/joshu/anaconda3/Scripts/conda.exe',
        'CONDA_PREFIX': 'C:\\Users\\joshu\\anaconda3',
        'CONDA_PROMPT_MODIFIER': '(base) ',
        'CONDA_PYTHON_EXE': 'C:/Users/joshu/anaconda3/python.exe',
        'CONDA_SHLVL': '1',
        'CONFIG_SITE': 'C:/Program Files/Git/mingw64/etc/config.site',
        'COMMONPROGRAMFILES(X86)': 'C:\\Program Files (x86)\\Common Files',
        'COMMONPROGRAMW6432': 'C:\\Program Files\\Common Files',
        'DISPLAY': 'needs-to-be-defined',
        'DRIVERDATA': 'C:\\Windows\\System32\\Drivers\\DriverData',
 

In [7]:
os.path.join('folder1','folder2','folder3')

'folder1\\folder2\\folder3'

In [8]:
sys.path

['C:\\Users\\joshu\\code\\hbs\\hbs-ds-060120\\module-1\\day-4-libraries-numpy',
 'C:\\Users\\joshu\\anaconda3\\python37.zip',
 'C:\\Users\\joshu\\anaconda3\\DLLs',
 'C:\\Users\\joshu\\anaconda3\\lib',
 'C:\\Users\\joshu\\anaconda3',
 '',
 'C:\\Users\\joshu\\anaconda3\\lib\\site-packages',
 'C:\\Users\\joshu\\anaconda3\\lib\\site-packages\\win32',
 'C:\\Users\\joshu\\anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\Users\\joshu\\anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\Users\\joshu\\anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\joshu\\.ipython']

In [11]:
# math
import math


In [12]:
math.pi

3.141592653589793

In [13]:
math.pow(2,3)

8.0

In [14]:
# datetime & time
import datetime

x = datetime.datetime.now()
print(x)

2020-06-04 08:09:33.921050


In [17]:
print(x.year)
print(x.strftime("Today is %A, %B %d %Y."))
print(x.strftime("%Y-%m-%d"))

2020
Today is Thursday, June 04 2020.
2020-06-04


In [20]:
datetime.datetime.utcnow()
#from datetime import datetime

datetime.datetime(2020, 6, 4, 15, 30, 58, 613089)

In [19]:
x = datetime.datetime(2020, 5, 17)

print(x)

2020-05-17 00:00:00


[Datetime formats](https://www.w3schools.com/python/python_datetime.asp)

In [None]:
# collections
from collections import Counter, defaultdict, namedtuple
c = Counter()                           # a new, empty counter
c = Counter('gallahad')                 # a new counter from an iterable
c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
c = Counter(cats=4, dogs=8) 

In [None]:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

In [None]:
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)     # instantiate with positional or keyword arguments
p[0] + p[1]             # indexable like the plain tuple (11, 22)


In [None]:
x, y = p                # unpack like a regular tuple
x, y
p.x + p.y               # fields also accessible by name
p                       # readable __repr__ with a name=value style


In [None]:
# pprint
from pprint
tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead', ('parrot', ('fresh fruit',))))))))
pp = pprint.PrettyPrinter(depth=6)
pp.pprint(tup)

In [None]:
# random
import random

print(random.random())

print(random.randrange(1, 10))

In [None]:
# zipfile, gzip, zlib, bz2

In [None]:
# pdb

### 2. NumPy

![numpy](https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/numpy.png)

[NumPy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 


In [59]:
import numpy as np
import numpy

x = numpy.array([1, 2, 3])
print(x)

# Many packages have a canonical way to import them

y = np.array([4, 5, 6])
print(y)

[1 2 3]
[4 5 6]


Because of numpy we can now get the **mean** and other quick math of lists and arrays.

In [None]:
example = [4, 3, 25, 40, 62, 20]
print(np.mean(example))

Now let's import some other packages. We will cover in more detail some fun options for numpy later.

In [None]:
import scipy
import pandas as pd
import matplotlib as mpl

In [None]:
# sometimes we will want to import a specific module from a library
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot

# What happens when we uncomment the next line?
# %matplotlib inline

plt.plot(x, y)

In [None]:
# OR we can also import it this way
from matplotlib import pyplot as plt
plt.plot(x, y)

Try importing the `seaborn` library as `sns` which is the convention.

In [None]:
# your code here


#### Helpful links: library documenation

Libraries have associated documentation to explain how to use the different tools included in a library.

- [NumPy](https://docs.scipy.org/doc/numpy/)
- [SciPy](https://docs.scipy.org/doc/scipy/reference/)
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/)
- [Matplotlib](https://matplotlib.org/contents.html)

### 2. NumPy versus base Python

Now that we know libraries exist, why do we want to use them? Let us examine a comparison between base Python and Numpy.

Python has lists and normal python can do basic math. NumPy, however, has the helpful objects called arrays.

Numpy has a few advantages over base Python which we will look at.

In [1]:
import numpy as np

In [2]:
names_list = ['Bob', 'John', 'Sally']
names_array = np.array(['Bob', 'John', 'Sally'])
print(names_list)
print(names_array)
names_array

['Bob', 'John', 'Sally']
['Bob' 'John' 'Sally']


array(['Bob', 'John', 'Sally'], dtype='<U5')

In [4]:
# Make a list and an array of three numbers
# your code here
my_list = [3,7,22]
my_array = np.array(my_list)

In [5]:
# divide your array by 2
my_array/2

array([ 1.5,  3.5, 11. ])

In [6]:
# divide your list by 2
my_list/2

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Numpy arrays support the `/` operator (which calls the `__div__()` method) while python lists do not. There are other things that make it useful to utilize numpy over base python for evaluating data.

In [8]:
# shape tells us the size of the array

my_array.shape

(3,)

In [10]:
my_array.size

3

In [12]:
# Selection and assignment work as you might expect
my_array[0:-1]

array([3, 7])

Take 5 minutes and explore each of the following functions.  What does each one do?  What is the syntax of each?
- `np.zeros()`
- `np.ones()`
- `np.full()`
- `np.eye()`
- `np.random.random()`

In [14]:
np.zeros([2,3])

array([[0., 0., 0.],
       [0., 0., 0.]])

In [16]:
np.ones([2,3])

array([[1., 1., 1.],
       [1., 1., 1.]])

In [25]:
np.full((4,3),3.14159)

array([[3.14159, 3.14159, 3.14159],
       [3.14159, 3.14159, 3.14159],
       [3.14159, 3.14159, 3.14159],
       [3.14159, 3.14159, 3.14159]])

In [38]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [37]:
np.eye(5,7)

array([[1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0.]])

In [36]:
np.random.random(18)

array([0.5430205 , 0.62572562, 0.0888654 , 0.94468791, 0.51002672,
       0.80816868, 0.88169618, 0.02230716, 0.05115823, 0.71681903,
       0.40379997, 0.70509291, 0.96427285, 0.70891653, 0.07490823,
       0.80776887, 0.16171268, 0.79103736])

### Slicing in NumPy

In [39]:
# We remember slicing from lists
numbers_list = list(range(10))
numbers_list[3:7]

[3, 4, 5, 6]

In [40]:
# Slicing in NumPy Arrays is very similar!
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [41]:
# first 2 rows, columns 1 & 2 (remember 0-index!)
b = a[:2, 1:3]
b

array([[2, 3],
       [6, 7]])

### Datatypes in NumPy

In [None]:
a.dtype

In [None]:
names_array.dtype

In [None]:
a.astype(np.float64).dtype

### More Array Math

In [44]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [45]:
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [46]:
# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [47]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [48]:
# Elementwise square root; both produce the same array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(x ** .5)
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]


Below, you will find a piece of code we will use to compare the speed of operations on a list and operations on an array. In this speed test, we will use the library [time](https://docs.python.org/3/library/time.html).

In [49]:
import time
import numpy as np

size_of_vec = 1000


def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X))]
    return time.time() - t1


def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print("python: " + str(t1), "numpy: " + str(t2))
print("Numpy is in this example " + str(t1/t2) + " times faster!")

python: 0.0 numpy: 0.0


ZeroDivisionError: float division by zero

In [60]:
np.__version__

'1.18.1'

In [55]:
!pip install --upgrade numpy

Requirement already up-to-date: numpy in c:\users\joshu\anaconda3\lib\site-packages (1.18.5)


In pairs, run the speed test with a different number, and share your results with the class.

NameError: name 'numpy' is not defined