# NumPy - Introduction

![alt](http://www.scatter.com/images/DataLab_logo.jpg)

_NumPy is the base N-dimensional array package._ (See http://www.numpy.org)

Two important capabilities provided by the package are:
1. N-dimensional array objects
1. Advanced computational math: linear algebra, random numbers, Fourier series, ...

A tutorial is available
- https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

It is common practice to import the NumPy library under the alias `np`

In [5]:
import numpy as np

## Table of Contents

##### 1. Creating arrays
- With objects
- With functions
- With random

##### 2. Shape and reshape

##### 3. Data types

##### 4. Basic operations 
- Elementwise
- Modifying arrays in-place
- Unary & by-axis

##### 5. Functions & broadcasting

##### 6. Indexing and slicing
- One-dimensional arrays
- Multidimensional arrays

## 1. Creating arrays

NumPy's main object is the multidimensional array.  It is a table of elements (usually numbers), __all of the same type__, indexed by a tuple of positive integers. In NumPy dimensions are called axes. The number of axes is the rank.

##### Creating arrays from objects
Arrays can be created from python `lists`, `tuples`, and `DataFrames`, passed into the `np.array()` function.

In [10]:
one_list = [1, 2, 3, 4]

a = np.array(one_list)
a

Note below that because one `float` was passed to the array, all numbers are converted into `floats`.

In [12]:
list_of_lists = [[1.1, 2, 3],[4, 5, 6]]

b = np.array(list_of_lists)
b

In [13]:
list_of_tuples = [(1.1, 1.2),
                  (2.1, 2.2),
                  (3.1, 3.2),
                  (4.1, 4.2)]

b = np.array(list_of_tuples)
b

In [14]:
import pandas as pd
data_frame = pd.DataFrame({'Column1': [2, 4, 6, 8], 'Column2':[5, 7, 9, 11]})
data_frame

In [15]:
c = np.array(data_frame)
c

Every array has a _shape_, which is a tuple whose length is the rank of the array and where each element indicates the length of each dimension.

In [17]:
c.shape

__Exercise__: Use a list of lists of lists to create a 3-D array with shape `(4,3,2)`. 

_Hint: The last two dimensions of an N-dimensional array shape are (..., rows, columns)_

##### Creating arrays with functions

NumPy includes some functions that are helpful for creating arrays:
- `zeroes()` and `ones()`
- `arange()`, `linspace()`, and `logspace()`

In [21]:
np.zeros(shape=(3,4))

Note the default `dtype` is `float`. This can be overridden when calling the function.

In [23]:
np.ones((2, 3, 4), dtype=int)

`np.arange()` returns a 1D array filled by a predefined range. Note that the stopping point is not included in the range.

In [25]:
np.arange(start=10, stop=30, step=4)

In [26]:
np.arange(start=1.1, stop=3.4, step=0.2)

`np.linspace()` returns an array of evenly spaced numbers over an interval. Note both endpoints __are__ included.

In [28]:
np.linspace(start=11, stop=14.5, num=6)

In [29]:
np.linspace(1, 8, num=4)

__Exercise:__ test out `np.logspace()` by specifying `start`, `stop`, and `num`. How does it work?

##### Creating arrays with random
NumPy has several simple randomization functions such as `.rand()` and `randn()` which fill arrays of predefined dimensions from the _uniform_ and _normal_ distributions, respectively. For a complete list of the random sampling capability, see the documentation: https://docs.scipy.org/doc/numpy/reference/routines.random.html.

These are located within the `random` module of NumPy.

In [32]:
np.random.rand(3, 2)

In [33]:
np.random.randn(2, 5, 3)

__Exercise:__ Try creating arrays with values drawn from the `binomial` and `triangular` distributions. See the documentation above.

## 2. Shape and reshape

Earlier we briefly introduced the `.shape` attribute, and specified shapes in when creating arrays filled with zeros, ones, and random numbers.

Equally importantly, arrays can be reshaped as needed using `.reshape()` as a function or a method.

In [39]:
a = np.linspace(1, 12, num=12)
a

In [40]:
np.reshape(a, (2, 6))

In [41]:
b = a.reshape((4, 3))
b

Note reshape reads and writes arrays elementwise (left-to-right, top-to-bottom)

In [43]:
b.reshape(3, 4)

__Exercise:__ Reshape `b` into a 3D array made of 2 sub-arrays with 3 rows and 2 columns.

Arrays can also be transposed with `.T`. and flattened with `.ravel()`

In [47]:
print(b)
print('')
print(b.T)

In [48]:
b.ravel()

## 3. Data types

In addition to the standard python numeric data types (`int`, `float` and `complex`), NumPy adds dtypes with predefined precision. Users can leverage these dtypes to save memory when working with large arrays.

Some examples:
- `int16`   Integer (-32768 to 32767)
- `int32`	Integer (-2147483648 to 2147483647)
- `int64`	Integer (-9223372036854775808 to 9223372036854775807)
- `float16`	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
- `float32`	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
- `float64`	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

https://docs.scipy.org/doc/numpy/user/basics.types.html

In [51]:
large_array = np.arange(10000).reshape(100,100)
print(large_array)

In [52]:
large_array.dtype

To change the `dtype`, we can apply the method `.astype()`

In [54]:
large_array = large_array.astype('int32')
large_array

In [55]:
large_array = large_array.astype('float16')
large_array

__Exercise:__ Convert `large_array` back to one of the integer dtypes

Arrays can also be created with booleans or strings. Note the string dtype is unicode characters of a specified maximum length.

In [59]:
bools = np.array([True, False, True, True])
bools

In [60]:
bools.dtype

In [61]:
np.array(['a', 'e', 'c', 'd'])

In [62]:
np.array(['one', 'two', 'three'])

## 4. Basic operations

##### Elementwise operations
Mathematical operators on arrays apply elementwise. A new array is created and filled with the result.

In [65]:
a = np.array([20, 30, 40, 50])
b = np.array([5, 10, 15, 20])
c = a - b
c

In [66]:
b ** 2

In [67]:
a/10

In [68]:
a < 35

When using multiple arrays, by default the shape needs to be the same. _See the later section on broadcasting for exceptions_.

__Exercise:__ Fix the code below to allow addition.

In [70]:
a = np.array([[2, 4, 5],
              [1, 1, 0]])

b = np.array([5, 6, 7, 8, 9, 10])
a + b

By default, multiplication happens elementwise. We won't cover matrices in detail here, but matrix multiplication can be achieved by using the `.dot()` method.

In [72]:
a = np.array( [[1, 1],
               [0, 1]] )
b = np.array( [[2, 0],
               [3, 4]] )

a * b

In [73]:
a.dot(b)

__Exercise:__ Create an array of integers and an array of floats and subtract the two. What do you notice about the `dtype` of the resulting array?

In [75]:
ints = np.array(  )
floats = np.array(  )

ints - floats

##### Modifying arrays in-place

Some operations, such as `+=` and `*=` act in place to modify an existing array rather than create a new one.

In [77]:
a = np.ones((2,3), dtype=int)
a

In [78]:
a *= 3
a

__Exercise:__ Without calculating it, what would be the output of `a += a`? Try it below to confirm.

##### Unary and by-axis operations

Many unary operations, such as computing the sum of all the elements in the array, are implemented as _methods_ of the ndarray class. By default, these operations apply to the array as though it were a list of numbers, regardless of its shape.

In [82]:
a = np.random.random((2,3))
a

In [83]:
a.sum()

In [84]:
a.min()

__Exercise:__ Find the standard deviation of `a` using `.std()`.

By specifying the axis parameter you can apply an operation along the specified axis of an array. In NumPy, `axis=0` is for columns, and `axis=1` is for rows.

In [88]:
b = np.arange(12).reshape(3,4)
b

In [89]:
b.sum(axis=0)

In [90]:
b.sum(axis=1)

In [91]:
b.min(axis=0)

In [92]:
b.min(axis=1)

## 5. Functions and broadcasting

##### Universal Functions

NumPy provides familiar mathematical functions such as `sin` and `exp`, which operate elementwise and produce an array as an output.

In [95]:
b = np.arange(6).reshape((3,2))
b

In [96]:
np.exp(b)

__Exercise:__ Use the `.sqrt` function to take the square root of `b`.

##### Broadcasting

Broadcasting allows universal functions to deal in a meaningful way with inputs that do not have exactly the same shape. See the documentation for details: https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html.

Broadcasting follows these rules:
1. All input arrays with `ndim` smaller than the input array of largest `ndim`, have 1’s prepended to their shapes. So an array with 2 rows and 3 columns, shape `(2, 3)`, acts as if it has shape `(1, 1, 1, ..., 2, 3)`.
1. The size in each dimension of the output shape is the maximum of all the input sizes in that dimension.
1. An input can be used in the calculation if its size in a particular dimension either matches the output size in that dimension, or has value exactly 1. 
1. If an input has a size of 1 along any dimension, the value of the array element is assumed to be the same along that dimension for the “broadcast” array. In other words, the array is repeated along that dimension until it matches the size of the larger array in that dimension.

In [101]:
one_by_four = np.array([1, 2, 3, 4])
one_by_four.shape

In [102]:
two_by_four = np.array([[5, 10, 15, 20],
                        [9, 1, 12, 5]])
two_by_four.shape

In [103]:
one_by_four + two_by_four

In [104]:
four_by_one = one_by_four.reshape((4,1))
print(four_by_one)

print('')

four_by_two = two_by_four.T
print(four_by_two)

In [105]:
four_by_two - four_by_one

__Exercise:__ Create an array with shape `(5, 3)` and an array with shape `(3, 5, 3)`. Can these be multiplied? What will be the result? 

Try it below.

## 6. Indexing and slicing

##### One-dimensional arrays

One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences. _Reminder, in Python all indexes start at 0._

In [111]:
a = np.arange(10)**3
a

In [112]:
a[2]

In [113]:
a[2:5]

Below is equivalent to `a[0:6:2]`, where `:2` selects every 2nd item. Then these items are replaced.

In [115]:
a[:6:2] = 1000
a

Just as `[::2]` selects every 2nd item, using `[::-1]` selects every 1st item, in reverse order:

In [117]:
a[ : :-1]

In [118]:
for i in a:
  print(i/10)

##### Multidimensional arrays

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas.

In [120]:
b = np.arange(10).reshape(2, 5)
b

In [121]:
b[1, 3]

In [122]:
b[0, 0:3]

In [123]:
b*0

In [124]:
b[-1] # The last row of b, equivalent to b[-1, :]

Iterating over multidimensional arrays is done with respect to the first axis

In [126]:
for row in b:
  print(row, row.sum())

To iterate over all `elements` in the multidimensional array, use the `.flat` attribute.

In [128]:
for element in b.flat:
  print(element*10)

__Exercise:__ How would you iterate over columns of `b`? Return the printed columns.

_Hint: Feel free to alter the layout of `b`_

__The End__