# Intro to numpy
`numpy` is a module for performing optimized vector operations on arrays of *n* dimensions.  Array operation in `numpy` can be up to 50 times faster than operations on regular Python lists.  See [this tutorial](https://www.studytonight.com/numpy/what-is-python-numpy-library) to learn more.

## basic concepts
- arrays in `numpy` are referred to as `ndarray`s, for n-dimensional arrays.
- each array stores values of a single data type, called that array's `dtype` - an `int`, `float`, or an object
- each value in an `ndarray` is indexed, starting from 0.
- a variety of functions exist for creating `ndarray`s.

## import numpy

In [2]:
import numpy as np

## creating ndarrays

In [38]:
# create an array from a regular Python list
x = np.array([23,56,2])
x

array([23, 56,  2])

In [41]:
# create an array of particular dimensions, with random values
np.random.rand(3,4) # 3 rows, 4 columns

array([[0.10259266, 0.2184147 , 0.38380412, 0.73068939],
       [0.95247627, 0.14601821, 0.58743405, 0.88686438],
       [0.26628684, 0.85087546, 0.9312413 , 0.84844995]])

In [43]:
# create an array with zeros as values of a particular shape and dtype
np.zeros((3,4), dtype = int) # 3 rows, 4 columns

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [44]:
# create an array with ones as values of a particular shape and dtype
np.ones((3,4), dtype = int) # 3 rows, 4 columns

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [46]:
# create an empty array (which may contain random values) of a particular shape and dtype
np.empty( [3, 4], dtype=int ) # 3 rows, 4 columns

array([[-8070450532247928832, -6917520248302721824,                    8,
                           0],
       [                   0,                    0,                    0,
                           0],
       [                   0,                    0,                    0,
                           0]])

In [33]:
# create an array of from a min to a max value, with a specific step
np.arange(10, 40, 5) # values from 10 up to 40 (exclusive), stepping by 5's

array([10, 15, 20, 25, 30, 35])

In [47]:
# create an array of from a min to a max value, with a specified number of evenly-spaced values
np.linspace(10, 40, 5) # 5 values from 10 up to 40 (inclusive)

array([10. , 17.5, 25. , 32.5, 40. ])

## basic properties of ndarrays

In [6]:
# one-dimensional array
x = np.array([23,56,2])
x

array([23, 56,  2])

In [7]:
# two-dimensional array
y = np.array([
    [11,22,33], 
    [45, 90, 6]]
)
y

array([[11, 22, 33],
       [45, 90,  6]])

In [9]:
# check the number of dimensions in x
y.ndim

2

In [14]:
# check the data type of the values in x
x.dtype

dtype('int64')

In [15]:
# change the dtype
new_x = x.astype(float)
new_x.dtype

dtype('float64')

## indexing
It is possible to filter the values in an ndarray using booleans to indicate which values to keep.

In [51]:
# filter using an array of boolean values
x = np.array([23,56,2])
filter = [False, True, False]
x[ filter ]

array([56])

In [53]:
# filter using a boolean expression
x[ x > 50 ]

array([56])

## removing nan values
Removing null values is easy by using the complement operator, `~`.

In [None]:
# a one-dimensional ndarray with a few nan values
x = np.array([np.nan, 1, 12, np.nan, 3, 41]) 

In [65]:
# first, here's how to filter to only include nan values but remove everything else... not exactly what we want
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ filter ] # results in an array with only the two nan values in it

array([nan, nan])

In [63]:
# using the complement operator to invert the logic of the previous filter
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ ~filter ] # results in an array with everything except the nan values in it

array([ 1., 12.,  3., 41.])

In [None]:
# an example with a two-dimensional array
x = np.array([np.nan, 1, 12] [3, 41, np.nan]  ] ) 