# [CptS 215 Data Analytics Systems and Algorithms](https://piazza.com/wsu/fall2017/cpts215/home)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# L4-1 Numpy and Scipy

Learner objectives for this lesson:
* Utilize numpy arrays and notation
* Utilize Scipy for scientific computing


## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* [Scipy website](https://www.scipy.org/)
* [Numpy website](http://www.numpy.org/)
* Python for Data Analysis by Wes McKinney

## Scipy Ecosystem Overview
From the [Scipy website](https://www.scipy.org/):
>SciPy (pronounced "Sigh Pie") is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages:
* [Numpy](http://www.numpy.org/): Base N-dimensional array package
* [Scipy library](): Fundamental library for scientific computing
* [Matplotlib](): Comprehensive 2D plotting
* [IPython](): Enhanced interactive console
* [Sympy](): Symbolic mathematics
* [Pandas](): Data structures and analysis

In this class, we will use all of the above, except for Sympy.

### Numpy
From the [Numpy website](http://www.numpy.org/):
>NumPy is the fundamental package for scientific computing with Python. It contains among other things:
* a powerful N-dimensional array object (`ndarray`)
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

>Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Typically, `numpy` is imported as `np`:

In [3]:
import numpy as np

### `ndarray` Object
Numpy's N-dimensional array object, `ndarray`, is one of the main reasons to use Numpy for data analytics. `ndarray` is a fast, flexible container for large data sets in Python. We will often use `ndarray` objects in lieu of Python list objects because `ndarray` supports mathematical operations on whole blocks of data using similar syntax to the equivalent operations on scalars. 

Let's take a look at an example. Let's say we have a list of the numbers 0 through 10:

In [4]:
x = list(range(11))
x = np.array(x)
print(x)

[ 0  1  2  3  4  5  6  7  8  9 10]


Note: We can omit the typecast to a list, because a `range` object can be converted to a `ndarray` object.

We can also make 2-D and N-D arrays. Numpy will "pretty print" the `ndarray` such that it is organized in a matrix format, instead of linear like the default printing for lists:

In [5]:
x = [[1, 2, 3], [4, 5, 6]]
print(x)
x = np.array(x)
print(x)
print("Number of dimensions: %d" %(x.ndim))
print("Shape (rows, cols): %s" %(str(x.shape)))
print("Datatype of items: %s" %(x.dtype))

# converting int items to float items
# astype() creates a new array
x_floats = x.astype(np.float)
print(x_floats)
print("Datatype of items: %s" %(x_floats.dtype))

[[1, 2, 3], [4, 5, 6]]
[[1 2 3]
 [4 5 6]]
Number of dimensions: 2
Shape (rows, cols): (2, 3)
Datatype of items: int32
[[ 1.  2.  3.]
 [ 4.  5.  6.]]
Datatype of items: float64


### `arange()`, `ones()`, and `zeros()`
Instead of using `range()` and then converting to an `ndarray`, we can create an `ndarray` object directly in a few ways:

In [14]:
x = np.arange(0, 11)
print(x)

x1 = np.arange(10)
print(x1)

x2 = np.ones(10)
print(x2)

x3 = np.zeros(10)
print(x3)

[ 0  1  2  3  4  5  6  7  8  9 10]
[0 1 2 3 4 5 6 7 8 9]
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


### Vectorization
Now, we want to two equal-length sequences together. Using lists we have to write a loop, such as the following:

In [1]:
x = range(11)
y = range(10, 21)
z = []
for i in range(len(x)):
    z.append(x[i] + y[i])
print(z)

[10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]


Using an `ndarray`, we can *vectorize* the addition operation to each item in the sequences, without writing a loop:

In [15]:
x = np.arange(10)
print(x)
x += 1
print(x)

[0 1 2 3 4 5 6 7 8 9]
[ 1  2  3  4  5  6  7  8  9 10]


Vectorization enables you to express batch operations on data without writing any loops.

Operations between differently sized arrays is called *broadcasting*. For example, we can broadcast a scaler (i.e. an array of length one) operation to each item in an array:

In [52]:
x = np.array(range(11))
x *= 2
print(x)

[ 0  2  4  6  8 10 12 14 16 18 20]


Note: See Chapter of Python for Data Analysis or the [Numpy docs](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) if you want to learn more about broadcasting.

Relational operators (==, !=, <, <=, >, >=) and can be vectorized:

In [53]:
m_names = np.array(["Mary", "Michael", "Margaret", "Mary", "Marcus", "Molly"])
m_ages =  np.array([28    , 72       , 12        , 34    , 40      , 68])
# marys is a Boolean array
marys = m_names == "Mary"
print(m_names)
print(marys)

print(m_ages[marys])

['Mary' 'Michael' 'Margaret' 'Mary' 'Marcus' 'Molly']
[ True False False  True False False]
[28 34]


Boolean operators (`and` and `not`) can be vectorized as well. For vectorized `and`, use `&`. For vectorized `or`, use `|`.

Note: `and` and `or` reserved keywords do not work with Boolean arrays.

In [54]:
m_names = np.array(["Mary", "Michael", "Margaret", "Mary", "Marcus", "Molly"])
m_ages =  np.array([28    , 72       , 12        , 34    , 40      , 68])
mary_marcus = (m_names == "Mary") | (m_names == "Marcus")
print(m_names)
print(mary_marcus)

print(m_ages[mary_marcus])

['Mary' 'Michael' 'Margaret' 'Mary' 'Marcus' 'Molly']
[ True False False  True  True False]
[28 34 40]


### Indexing
Indexing `ndarray` objects works just like with lists:

In [24]:
x = np.arange(10)
print(x)
print(x[3])

[0 1 2 3 4 5 6 7 8 9]
3


We can also specify indices into N-dimensional `ndarray()` objects using commas:

In [39]:
ones = np.ones((2, 3))
print(ones[0][0])
# using a comma
print(ones[0, 0])

1.0
1.0


### Assignment
Just like with lists, we can update values in an `ndarray` using the assignment operator. For this example, we will work with a 3x4 array of random data:

In [58]:
from numpy.random import randn
rand_data = randn(3, 4)
print(rand_data)
rand_data[2][0] = 100
print(rand_data)

# Boolean array for negative values
negatives = rand_data < 0
print(negatives)
# set the negative values to 0
rand_data[negatives] = 0
print(rand_data)

[[-0.32142165 -0.1708455   0.43744892  0.83113333]
 [ 0.95326129 -0.59611479 -1.0148598  -0.45504578]
 [ 1.36409435  0.35947998 -0.93991428  0.35298872]]
[[  -0.32142165   -0.1708455     0.43744892    0.83113333]
 [   0.95326129   -0.59611479   -1.0148598    -0.45504578]
 [ 100.            0.35947998   -0.93991428    0.35298872]]
[[ True  True False False]
 [False  True  True  True]
 [False False  True False]]
[[   0.            0.            0.43744892    0.83113333]
 [   0.95326129    0.            0.            0.        ]
 [ 100.            0.35947998    0.            0.35298872]]


### Slicing
`ndarray` slicing works similar to lists; however, there are a few subtle differences:
* Slices are "views" of the `ndarray`, not copies
* Assigning a slice to a scalar (or an `ndarray` of a different dimension that the slice), broadcasts the scalar

In [30]:
x_list = list(range(10))
print("x_list: %s" %(x_list))
chunk = x_list[3:7]
print("chunk: %s" %(chunk))
# doesn't modify x_list because chunk is a copy
chunk[0] = 50
print("chunk: %s" %(chunk))
print("x_list: %s" %(x_list))


x = np.arange(10)
print(x)
print("x: %s" %(x))
chunk = x[3:7]
print("chunk: %s" %(chunk))
# does modify x_list because chunk is a view
chunk[0] = 50
print("chunk: %s" %(chunk))
print("x: %s" %(x))

# broadcasts
x[2:5] = 100
print(x)

x_list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
chunk: [3, 4, 5, 6]
chunk: [50, 4, 5, 6]
x_list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0 1 2 3 4 5 6 7 8 9]
x: [0 1 2 3 4 5 6 7 8 9]
chunk: [3 4 5 6]
chunk: [50  4  5  6]
x: [ 0  1  2 50  4  5  6  7  8  9]
[  0   1 100 100 100   5   6   7   8   9]


Note: if you want a copy of an `ndarray` slice instead of a view, you can copy the `ndarray()` using the `ndarray()` method `copy()`:

In [35]:
x = np.arange(10)
print(x)
print("x: %s" %(x))
chunk = x[3:7].copy()
print("chunk: %s" %(chunk))
# does modify x_list because chunk is now a copy
chunk[0] = 50
print("chunk: %s" %(chunk))
print("x: %s" %(x))

[0 1 2 3 4 5 6 7 8 9]
x: [0 1 2 3 4 5 6 7 8 9]
chunk: [3 4 5 6]
chunk: [50  4  5  6]
x: [0 1 2 3 4 5 6 7 8 9]


### Reshaping
We can change the shape of an `ndarray` object, i.e. we can change the dimensions. For example, say we have a 1D array that we want to change into a 2D array:

In [59]:
ints = np.arange(10)
print(ints.shape)
print(ints)
ints = ints.reshape(5, 2)
print(ints.shape)
print(ints)

(10,)
[0 1 2 3 4 5 6 7 8 9]
(5, 2)
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


### Transposing
Matrix transposition turns the rows of the matrix into columns and the columns into rows. `ndarray` has support for tranposing:

In [68]:
x = np.arange(6).reshape((2, 3))
print(x)
print(x.shape)
x_t = x.T
print(x_t)
print(x_t.shape)

[[0 1 2]
 [3 4 5]]
(2, 3)
[[0 3]
 [1 4]
 [2 5]]
(3, 2)


### `ndarray` Functions
`ndarray` has several fast, vectorized universal functions (ufuncs) that perform element-wise operations on data.

#### Unary ufuncs
Unary ufuncs accept a single `ndarray` and apply an operation element-wise. Example ufuncs include:
* `np.sqrt()`: Element wise square root
* `np.absolute()`: Element wise absolute value
* `np.sine()`: Element wise trigonometric sign

For a full list of available ufuncs, please read the [Numpy docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs), there are over 60 of them!

In [75]:
nums = np.arange(10)

print(np.sqrt(nums))
print(np.absolute(nums))

[ 0.          1.          1.41421356  1.73205081  2.          2.23606798
  2.44948974  2.64575131  2.82842712  3.        ]
[0 1 2 3 4 5 6 7 8 9]


#### Binary ufuncs
Binary ufuncs accept a single `ndarray` and apply an operation element-wise. Example binary ufuncs include:
* `np.power()`: Element wise exponentiation
* `np.maximum()`: Element wise maximum comparison
* `np.minimum()`: Element wise minimum comparison

For a full list of available ufuncs, please read the [Numpy docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs), there are over 60 of them!

In [76]:
nums = np.arange(5)
nums2 = np.arange(5) + 1

print(np.power(nums, 2))
print(np.power(nums, nums2))
print(np.maximum(nums, nums2))

[ 0  1  4  9 16]
[   0    1    8   81 1024]
[1 2 3 4 5]
