# Intro to numpy
`numpy` is a module for performing optimized vector operations on arrays of *n* dimensions.  Array operation in `numpy` can be up to 50 times faster than operations on regular Python lists.

## basic concepts
- arrays in `numpy` are referred to in documentation as ndarrays, for n-dimensional arrays.
- each array stores values of a single data type, called that array's `dtype` - an `int`, `float`, or an object
- each value in an `ndarray` is indexed, starting from 0.
- a variety of functions exist for creating `ndarray`s.

## import numpy

In [2]:
import numpy as np

## creating ndarrays
A variety of ways to create ndarrays in `numpy`.

In [97]:
# create a one-dimensional array from a regular Python list
np.array([23,56,2])

array([23, 56,  2])

In [98]:
# create a two-dimensional array from a regular two-dimensional Python list
np.array([
    [11,22,33], 
    [45, 90, 6]]
)

array([[11, 22, 33],
       [45, 90,  6]])

In [193]:
# create an array of particular dimensions, with random values
np.random.random_sample( (3, 4) )  # 3 rows, 4 columns

array([[0.10219055, 0.85492188, 0.12405458, 0.87491665],
       [0.21467651, 0.85459851, 0.87119142, 0.13311813],
       [0.9023749 , 0.44038876, 0.25434983, 0.6081812 ]])

In [43]:
# create an array with zeros as values of a particular shape and dtype
np.zeros( (3, 4), dtype = int ) # 3 rows, 4 columns

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [44]:
# create an array with ones as values of a particular shape and dtype
np.ones( (3,4), dtype = int ) # 3 rows, 4 columns

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [46]:
# create an empty array (which may contain random values) of a particular shape and dtype
np.empty( [3, 4], dtype=int ) # 3 rows, 4 columns

array([[-8070450532247928832, -6917520248302721824,                    8,
                           0],
       [                   0,                    0,                    0,
                           0],
       [                   0,                    0,                    0,
                           0]])

In [195]:
# create an array of from a min to a max value, with a specific step
np.arange( 10, 40, 5 ) # values from 10 up to 40 (exclusive), stepping by 5's

array([10, 15, 20, 25, 30, 35])

In [194]:
# create an array of from a min to a max value, with a specified number of evenly-spaced values
np.linspace( 10, 40, 5 ) # 5 values from 10 up to 40 (inclusive)

array([10. , 17.5, 25. , 32.5, 40. ])

## introspection
Find out some metadata about an array.

In [196]:
# one-dimensional array
x = np.array([23,56,2])
x

array([23, 56,  2])

In [7]:
# two-dimensional array
y = np.array([
    [11,22,33], 
    [45, 90, 6]]
)
y

array([[11, 22, 33],
       [45, 90,  6]])

In [102]:
# check the number of dimensions in x
y.ndim

2

In [104]:
# check the shape of y, which has two rows, three columns
y.shape

(2, 3)

In [14]:
# check the data type of the values in x
x.dtype

dtype('int64')

In [15]:
# change the dtype
new_x = x.astype(float)
new_x.dtype

dtype('float64')

## Reshaping arrays

In [199]:
# two-dimensional array
y = np.array([
    [11,22,33], 
    [45, 90, 6]]
)

In [198]:
# reshape the original array from two rows, three columns into 1 row, 6 columns
y.reshape(1, 6)

array([[11, 22, 33, 45, 90,  6]])

In [149]:
# pivot the data, so rows become columns and columns become rows
x = np.array( [ 
    [2, 3, 4], 
    [5, 6, 7] 
] )
x.transpose()

array([[2, 5],
       [3, 6],
       [4, 7]])

## Merging arrays

In [139]:
# append new values to an existing array
x = np.array([2, 3, 4])
np.append(x, [5,6,7])

array([2, 3, 4, 5, 6, 7])

In [128]:
# join two one-dimensional arrays with the same shape along a specified axis
x = np.array( [2, 3, 4] )
y = np.array( [5, 6, 7] )
np.concatenate( (x, y), axis=0)

array([2, 3, 4, 5, 6, 7])

In [134]:
# join two two-dimensional arrays with the same shape along a specified axis
# with axis=0, the arrays will be merged 'vertically'
x = np.array( [ 
    [2, 3, 4], 
    [5, 6, 7] 
] )
y = np.array( [ 
    [8, 9, 10], 
    [11, 12, 13] 
]  )
np.concatenate( (x, y), axis=0)

array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10],
       [11, 12, 13]])

In [135]:
# same as above, but with axis=1, merges 'horizontally'
np.concatenate( (x, y), axis=1)

array([[ 2,  3,  4,  8,  9, 10],
       [ 5,  6,  7, 11, 12, 13]])

## indexing
It is possible to filter the values in an ndarray using booleans to indicate which values to keep.

In [169]:
# filter using an array of boolean values
x = np.array([23,56,2])
filter = [False, True, False]
x[ filter ]

array([56])

In [170]:
# filter using a boolean expression
filter = x > 50
x[ filter ]

array([56])

In [172]:
y = np.array( ['hippopotamus', 'giraffe', 'platypus', 'kiwi', 'albatross' ] )

# select only those animals with a 'p' in their name
filter = np.char.find(y, 'p') >= 0
y[ filter ]

array(['hippopotamus', 'platypus'], dtype='<U12')

## removing nan values
Removing null values (represented in `numpy` as `nan`) is easy by using the complement operator, `~`, in tandem with the `isnan()` function.

In [None]:
# a one-dimensional ndarray with a few nan values
x = np.array([np.nan, 1, 12, np.nan, 3, 41]) 

In [65]:
# first, here's how to filter to only include nan values but remove everything else... not exactly what we want
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ filter ] # results in an array with only the two nan values in it

array([nan, nan])

In [63]:
# using the complement operator to invert the logic of the previous filter
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ ~filter ] # results in an array with everything except the nan values in it

array([ 1., 12.,  3., 41.])

## Basic statistics

In [179]:
x = np.array([
    [ 2, 50, 100],
    [ 3, 60,  200],
    [ 4, 55, 150],
    [ 5, 40, 250]
])

In [180]:
# calculate means 'vertically'... the median() function works similarly
np.mean(x, axis=0)

array([  3.5 ,  51.25, 175.  ])

In [190]:
# calculate means 'horizontally'... the median() function works similarly
np.mean(x, axis=1)

array([50.66666667, 87.66666667, 69.66666667, 98.33333333])

In [188]:
# determine the minimum value 'vertically'... the amax() function works similarly
np.amin(x, axis=0)

array([  2,  40, 100])

In [187]:
# determine the minimum value 'horizontally'... the amax() function works similarly
np.amin(x, axis=1)

array([2, 3, 4, 5])

In [192]:
# standard deviation 'vertically'
np.std(x, axis=0)

array([ 1.11803399,  7.39509973, 55.90169944])

In [191]:
# standard deviation 'horizontally'
np.std(x, axis=1)

array([ 40.01110957,  82.77009659,  60.49977043, 108.19221578])