# Intro to numpy
`numpy` is a module for performing optimized vector operations on arrays of *n* dimensions.  Array operation in `numpy` can be up to 50 times faster than operations on regular Python lists.

## Set up environment
The following command ensures that `numpy` is installed in the local environment.  This may not be necessary, depending on your system, and may be removed if not.

In [None]:
!pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/usr/local/opt/python@3.10/bin/python3.10 -m pip install --upgrade pip[0m


## basic concepts
- arrays in `numpy` are referred to in documentation as **ndarrays**, for n-dimensional arrays.
- each array stores values of a single data type, called that array's **dtype** - usually an `int`, `float`, or an object type
- like Python lists, each value in an ndarray is indexed with an integer, starting from 0.

## import numpy

In [65]:
import numpy as np

## creating ndarrays
A variety of ways to create ndarrays in `numpy`.

In [5]:
# create a one-dimensional array from a regular Python list
np.array( [23, 56, 2] )

array([23, 56,  2])

In [6]:
# create a two-dimensional array from a regular two-dimensional Python list
np.array(
    [
        [11,22,33], 
        [45, 90, 6]
    ]
)

array([[11, 22, 33],
       [45, 90,  6]])

In [7]:
# create an array of particular dimensions, with random values
np.random.random_sample( (3, 4) )  # 3 rows, 4 columns

array([[0.85474603, 0.09209649, 0.38772521, 0.32038834],
       [0.39198053, 0.82824958, 0.39822133, 0.09713616],
       [0.85481038, 0.21292715, 0.21046755, 0.42055327]])

In [8]:
# create an array with zeros as values of a particular shape and dtype
np.zeros( (3, 4), dtype = float ) # 3 rows, 4 columns

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [9]:
# create an array with ones as values of a particular shape and dtype
np.ones( (3,4), dtype = int ) # 3 rows, 4 columns

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [10]:
# create an empty array (which may contain random values) of a particular shape and dtype
np.empty( [3, 4], dtype=int ) # 3 rows, 4 columns

array([[4605874087312961977, 4591300672283200000, 4600656256372470980,
        4599443223212879208],
       [4600732913442100328, 4605635428562678698, 4600845337685524498,
        4591663818836242032],
       [4605874666943865762, 4596839529402706876, 4596750912752915272,
        4601247634201405030]])

In [11]:
# create an array of from a min to a max value, with a specific step
np.arange( 10, 40, 5 ) # values from 10 up to 40 (exclusive), stepping by 5's

array([10, 15, 20, 25, 30, 35])

In [12]:
# create an array of from a min to a max value, with a specified number of evenly-spaced values
np.linspace( 10, 40, 5 ) # 5 values from 10 up to 40 (inclusive)

array([10. , 17.5, 25. , 32.5, 40. ])

## Indexing and slicing
Indexing is similar to Python lists.

In [74]:
# a two-dimensional array
a = np.array([ 
    [1, 2, 3], 
    [4, 5, 6], 
    [7, 8, 9],
    [10, 11, 12]
])

In [75]:
# get the second element from the outer array
a[1]

array([4, 5, 6])

In [76]:
# negative indices work as expected
a[-2]

array([7, 8, 9])

In [79]:
# get the third element from the second sub-array
a[1][2]

6

In [80]:
# get the third element from the second-to-last sub-array
a[-2, 2]

9

In [18]:
# simple slice
a[1 : 3]

array([[4, 5, 6],
       [7, 8, 9]])

In [19]:
# a slice from the third element thru the end
a[2 : ]

array([[ 7,  8,  9],
       [10, 11, 12]])

In [20]:
# a slice from the first element till the third
a[ : 2]

array([[1, 2, 3],
       [4, 5, 6]])

## Simple math operations
It's possible to do operations such as addition, subtraction, and more between two ndarrays and between an ndarray and a scalar.

In [21]:
# take these two one-dimensional ndarrays for example
a = np.array([1, 2, 3, 4])
b = np.array([4, 3, 2, 1])

In [22]:
# add a scalar to the vector
a + 1

array([2, 3, 4, 5])

In [23]:
# subtract a scalar from the vector
a - 1

array([0, 1, 2, 3])

In [24]:
# add two vectors together
a + b

array([5, 5, 5, 5])

In [25]:
# subtract one vector from another
a - b

array([-3, -1,  1,  3])

In [26]:
# multiply two vectors together
a * b

array([4, 6, 6, 4])

In [27]:
# divide one vector by another
a / b

array([0.25      , 0.66666667, 1.5       , 4.        ])

In [28]:
# these sorts of operations also work with multi-dimensional arrays
a = np.array([ 
    [1, 2, 3], 
    [4, 100, 6], 
    [7, 8, 9],
    [10, 11, 300]
])
b = np.array([
    [1, 1, 1],
    [2, 2, 2],
    [3, 3, 3],
    [4, 4, 300],
])
a + b

array([[  2,   3,   4],
       [  6, 102,   8],
       [ 10,  11,  12],
       [ 14,  15, 600]])

In [29]:
a * 10

array([[  10,   20,   30],
       [  40, 1000,   60],
       [  70,   80,   90],
       [ 100,  110, 3000]])

In [30]:
# comparisons
a > 200

array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False,  True]])

In [31]:
# more comparisons
a != 100

array([[ True,  True,  True],
       [ True, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

## introspection
Find out some metadata about an array.

In [32]:
# one-dimensional array
x = np.array([23,56,2])
x

array([23, 56,  2])

In [33]:
# two-dimensional array
y = np.array([
    [11,22,33], 
    [45, 90, 6]]
)
y

array([[11, 22, 33],
       [45, 90,  6]])

In [34]:
# check the number of dimensions in x
y.ndim

2

In [35]:
# check the shape of y, which has two rows, three columns
y.shape

(2, 3)

In [36]:
# check the data type of the values in x
x.dtype

dtype('int64')

In [37]:
# change the dtype
new_x = x.astype(float)
new_x.dtype

dtype('float64')

## Reshaping arrays

In [38]:
# two-dimensional array
y = np.array([
    [11,22,33], 
    [45, 90, 6]]
)

In [39]:
# reshape the original array from two rows, three columns into 1 row, 6 columns
y.reshape(1, 6)

array([[11, 22, 33, 45, 90,  6]])

In [40]:
# pivot the data, so rows become columns and columns become rows
x = np.array( [ 
    [2, 3, 4], 
    [5, 6, 7] 
] )
x.transpose()

array([[2, 5],
       [3, 6],
       [4, 7]])

## Merging arrays

In [86]:
# append new values to an existing array
x = np.array([2, 3, 4])
np.append(x, [5,6,7])

array([2, 3, 4, 5, 6, 7])

In [87]:
# join two one-dimensional arrays with the same shape along a specified axis
x = np.array( [2, 3, 4] )
y = np.array( [5, 6, 7] )
np.concatenate( (x, y), axis=0)

array([2, 3, 4, 5, 6, 7])

In [88]:
# join two two-dimensional arrays with the same shape along a specified axis
# with axis=0, the arrays will be merged 'vertically'
x = np.array( [ 
    [2, 3, 4], 
    [5, 6, 7] 
] )
y = np.array( [ 
    [8, 9, 10], 
    [11, 12, 13] 
]  )
np.concatenate( (x, y), axis=0)

array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10],
       [11, 12, 13]])

In [44]:
# same as above, but with axis=1, merges 'horizontally'
np.concatenate( (x, y), axis=1)

array([[ 2,  3,  4,  8,  9, 10],
       [ 5,  6,  7, 11, 12, 13]])

## Filtering
It is possible to filter the values in an ndarray using booleans to indicate which values to keep.

In [45]:
# filter using an array of boolean values
x = np.array([23,56,2])
filter = [False, True, False]
x[ filter ]

array([56])

In [46]:
# filter using a boolean expression
filter = x > 50
x[ filter ]

array([56])

In [83]:
y = np.array( ['hippopotamus', 'giraffe', 'platypus', 'kiwi', 'albatross' ] )

# select only those animals with a 'p' in their name
find_results = np.char.find(y, 'p')
find_results

array([ 2, -1,  0, -1, -1])

In [84]:
filter = find_results >= 0
filter

array([ True, False,  True, False, False])

In [85]:
y[ filter ]

array(['hippopotamus', 'platypus'], dtype='<U12')

## Finding unique values
Return just the unique set of values in any array

In [48]:
# Take the following array, which contains some redundant values
a = np.array([
    [ 100,  200,  300],
    [ 300,  400,  500],
    [ 500,  600, 700],
    [700, 800, 900]
])

In [49]:
# get an array that contains just the unique values
unique_values = np.unique(a)
unique_values

array([100, 200, 300, 400, 500, 600, 700, 800, 900])

In [50]:
# get the index numbers of the unique values in the original array
unique_values, index_list = np.unique(a, return_index=True)
index_list

array([ 0,  1,  2,  4,  5,  7,  8, 10, 11])

## Removing nan values
Removing null values (represented in `numpy` as `nan`) is easy by using the complement operator, `~`, in tandem with the `isnan()` function.

In [51]:
# a one-dimensional ndarray with a few nan values
x = np.array([np.nan, 1, 12, np.nan, 3, 41]) 

In [52]:
np.isnan(x)

array([ True, False, False,  True, False, False])

In [53]:
# first, here's how to filter to only include nan values but remove everything else... not exactly what we want
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ filter ] # results in an array with only the two nan values in it

array([nan, nan])

In [54]:
# using the complement operator to invert the logic of the previous filter
filter = np.isnan(x) # returns an array, [True, False, False, True, False, False]
x[ ~filter ] # results in an array with everything except the nan values in it

array([ 1., 12.,  3., 41.])

## Basic statistics

In [55]:
x = np.array([
    [ 2, 50, 100],
    [ 3, 60,  200],
    [ 4, 55, 150],
    [ 5, 40, 250]
])

In [56]:
# calculate the mean of all values in the array... the median() function works similarly
x.mean()

76.58333333333333

In [57]:
# calculate means 'vertically'... the median() function works similarly
np.mean(x, axis=0)

array([  3.5 ,  51.25, 175.  ])

In [58]:
# calculate means 'horizontally'... the median() function works similarly
np.mean(x, axis=1)

array([50.66666667, 87.66666667, 69.66666667, 98.33333333])

In [59]:
# determine the minimum value of all values in the array... the max() function works similarly
x.min()

2

In [60]:
# determine the minimum value 'vertically'... the amax() function works similarly
np.amin(x, axis=0)

array([  2,  40, 100])

In [61]:
# determine the minimum value 'horizontally'... the amax() function works similarly
np.amin(x, axis=1)

array([2, 3, 4, 5])

In [62]:
# the standard deviation of all values in the array
x.std()

79.26691021829699

In [63]:
# standard deviation 'vertically'
np.std(x, axis=0)

array([ 1.11803399,  7.39509973, 55.90169944])

In [64]:
# standard deviation 'horizontally'
np.std(x, axis=1)

array([ 40.01110957,  82.77009659,  60.49977043, 108.19221578])