# TO DO 2
Notes on Numpy Library (Notes from Python Data Science Handbook)  
September 7, 2017

# NumPy: Numerical Python

- built to store and operate on dense data
- "np" is conventionally used as the alias 
- Storing data:
    - lists = dynamically typed
    - arrays = uniformly typed
- NumPy is constrained to uniformly typed arrays
    - NumPy is implemented using C

## NumPy Arrays
- arrays have the following attributes (not all):
    1. ndim = number of dimensions
    2. shape = size of each dimension
    3. size = total size of the array
    4. dtype = data type of the array
- arrays begin counting from the 0th element
    - negative indexing also works
    - 2D arrays are accessed using tuples
- array values are mutable
- array slicing notation: ```x[startInd:stopInd:step]```  
- sliced arrays in NumPy are copies
- arrays can be "reshaped" using the reshape() method

In [1]:
# From Python Data Science Handbook
import numpy as np
grid = np.arange(1, 10).reshape((3,3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


- other useful NumPy functions:
    - concatenate()
    - vstack() and hstack() help with mixed dimension arrays
    - split()
    - vsplit() and hsplit()

In [2]:
# From Python Data Science Handbook
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


- splitting arrays:
    - giving N number of split points will return N+1 arrays

## Computation on NumPy Arrays
- NumPy performs "vectorized" operations
    - called UFuncs
    - use Python's default operators
- these operations when performed on an array, perform them on each element

In [3]:
x = np.arange(1, 10)
print(x)
print('x ** 2 = ', x ** 2)

[1 2 3 4 5 6 7 8 9]
x ** 2 =  [ 1  4  9 16 25 36 49 64 81]


- other computations:
    - absolute value
    - trig functions
    - exponents and logarithmic functions
- SciPy is a submodule of NumPy
    - provides additional functions
    

## Aggregations
- Sum
    - Python's sum() and NumPy's np.sum() are different
    - NumPy's runs faster
- Min and Max
    - use the member functions of NumPy's arrays instead of Python's


## Broadcasting
- another way of performing computation on arrays


In [4]:
print(np.arange(3)+5)
print(np.ones((3, 3))+np.arange(3))
print(np.arange(3).reshape((3, 1))+np.arange(3))

[5 6 7]
[[ 1.  2.  3.]
 [ 1.  2.  3.]
 [ 1.  2.  3.]]
[[0 1 2]
 [1 2 3]
 [2 3 4]]


- Rules of Broadcasting (from Python Data Science Book)
    1. If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
    2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
    3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

## Comparisons, Masks, and Boolean Logic
- Ufuncs also implement comparison operators
    - NumPy also implements counting function to assist
- Boolean logic can be used to mask multidimensional arrays


## Fancy Indexing
- passing in an array of indicies to access multiple elements at once
- the shape of the result reflects the shape of the fancy index, rather than the shape of the array being indexed
- can be combined with other indexing schemes


In [5]:
x = np.arange(1000)
[x[1], x[67], x[2]]

[1, 67, 2]

## Sorting
- NumPy implements its own sort()
    - uses QUICKSORT algorithm
- using the axis arg, you can sort along only specific rows or columns
- sorting can be partitioned

## Structured Data
- helps relate data that is not the same data type

In [6]:
# From Python Data Science Handbook
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


- U = Unicode
- i = integer
- f = float