# NumPy Basics: Arrays and Vectorized Computation

NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing
and data analysis.

Here what it provides:<br />

1- ndarray. a fast and space-efficient multidimensional array. <br />
2- Standard mathematical functions for fast operations on entire arrays of data without having to write loops.<br />
3- Tools for reading / writing array data to disk and working with memory-mapped files.<br />
4- Linear algebra, ranadom number generation and Fourier transform capabilities.<br />
5- Tools for intergating code written in C/C++ and Fortran.

## The NumPy ndarray: A Multidimensional Array Object

One of the key feautures of NumPy is its N-dimensional array object, or ndarray which is fast, flexible container 
for large data sets in Python.

### Creating ndarray

In [1]:
# need to import the numpy library
import numpy as np

In [2]:
# one dimensional array
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([ 6. ,  7.5,  8. ,  0. ,  1. ])

In [3]:
# two dimensional array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

In [4]:
# dimension of the array
arr2.ndim

2

In [5]:
# shape of the array
arr2.shape

(2, 4)

In [6]:
# data type of the array
arr1.dtype

dtype('float64')

In [7]:
# size of the array
arr2.size

8

In [8]:
# number of rows
len(arr2)

2

In [9]:
# number of columns
# refer to this after reading about slicing
len(arr2[0,:])

4

In [10]:
# create one dimensional array and all zero
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [11]:
# create one dimensional array and all ones
np.ones(5)

array([ 1.,  1.,  1.,  1.,  1.])

In [12]:
# create two dimensional array and all zero
np.zeros((3,5))

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [13]:
# similar to range but create one dimensional array
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [14]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [15]:
# create an array similar to arr2 shape and all ones
arr3 = np.ones_like(arr2)
arr3

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [16]:
# create an array similar to arr2 shape and all zeros
arr4 = np.zeros_like(arr2)
arr4

array([[0, 0, 0, 0],
       [0, 0, 0, 0]])

In [17]:
# create empty array (allocating new memory so values might be garbage)
arr5 = np.empty((3, 4))
arr5

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [18]:
# creates an empty array similar shape of arr2
arr6 = np.empty_like(arr2)
arr6

array([[809933936,       484,         0,         0],
       [        2,         0,         2,         0]])

In [19]:
# create n x n identity matrix
arr7 = np.identity(5)
arr7

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

In [20]:
# create n x n identity matrix
arr8 = np.eye(3)
arr8

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

### Data Types for ndarrays

In [21]:
arr1 = np.array([1,2,3], dtype=np.float64)
arr1.dtype

dtype('float64')

In [22]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

####array types:<br />
int8, uint8<br />
int16, uint16<br />
int32, uint32<br />
int64, uint64<br />
float16<br />
float32<br />
float64<br />
float128<br />
complex64, complex128<br />
complex256<br />
bool<br />
object<br />
string_<br />
unicode_<br />

In [23]:
arr = np.array([1, 2, 3])
arr.dtype

dtype('int32')

In [24]:
float_arr = arr.astype(np.float64)
float_arr

array([ 1.,  2.,  3.])

In [25]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10])

In [26]:
# you can drop the dtype and get same result
numeric_strings = np.array(['1.2', '3.4', '5.6'], dtype=np.string_)
numeric_strings.astype(np.float64)

array([ 1.2,  3.4,  5.6])

### Operation between Arrays and Scalars

In [27]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [28]:
arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [29]:
arr + arr

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [30]:
arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

In [31]:
1.0 / arr

array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]])

In [32]:
arr ** 2

array([[ 1,  4,  9],
       [16, 25, 36]], dtype=int32)

### Basic Indexing and Slicing

In [33]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [34]:
arr[5]

6

In [35]:
arr[5:8]

array([6, 7, 8])

In [36]:
arr[5:8] = 12
arr

array([ 1,  2,  3,  4,  5, 12, 12, 12,  9])

In [37]:
# IMPORTANT: slices are views of orignal array, so change to view affects original one
arr_slice = arr[5:8]
arr_slice[1] = 1000
arr

array([   1,    2,    3,    4,    5,   12, 1000,   12,    9])

In [41]:
arr_slice[:] = 64
arr

array([ 1,  2,  3,  4,  5, 64, 64, 64,  9])

In [42]:
# this is how you create new array not the view of the original array
arr_new = np.array(arr[5:8])
arr[6] = 200
# no side effect on arr_new
arr_new

array([64, 64, 64])

In [43]:
# or you can use
arr_new = arr[5:8].copy()
arr_new

array([ 64, 200,  64])

In [44]:
# some examples for higher dimensional arrays
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

array([7, 8, 9])

In [45]:
arr2d[2][2]

9

In [46]:
# or you can
arr2d[2, 2]

9

In [47]:
# examples for 3D arrays
arr3d = np.array([[[1, 2, 3], [3, 4, 5]], [[6, 7, 8], [9, 10 , 11]]])
arr3d              

array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [48]:
# imagine every index that you use, you get into one bracket 
# this below generates a 2 x 3 array
arr3d[0]

array([[1, 2, 3],
       [3, 4, 5]])

In [49]:
arr3d[0][1]

array([3, 4, 5])

In [50]:
arr3d[0][1][2]

5

In [51]:
# or you can type
arr3d[0, 1, 2]

5

In [52]:
# some more operations
# again, you need copy so you dont generate a view
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

array([[[42, 42, 42],
        [42, 42, 42]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [53]:
arr3d[0] = old_values
arr3d

array([[[ 1,  2,  3],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

### Indexing with Slices

In [54]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [55]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [56]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [57]:
arr2d[1, :2]

array([4, 5])

In [58]:
arr2d[2, :1]

array([7])

In [59]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [62]:
arr2d[:2, 1:] = 1000
arr2d

array([[   1, 1000, 1000],
       [   4, 1000, 1000],
       [   7,    8,    9]])

### Boolean Indexing

In [63]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
# random number of normal distribution [-1, 1]
data = np.random.randn(7, 4)

In [64]:
names.shape

(7,)

In [65]:
data

array([[ 0.33991316,  0.44655472, -0.59535177, -0.41227647],
       [-0.23017376, -0.28943333,  0.29611074,  1.29011785],
       [ 0.29466714, -1.08029722,  0.80006734,  1.3247759 ],
       [ 0.6508215 , -1.62242832, -1.44300344,  0.67693852],
       [ 0.1940658 ,  0.10991205, -1.97036678, -1.88849743],
       [ 1.26790042, -0.03818258,  1.21188182,  0.13016965],
       [-1.18737781,  0.55513399, -1.3291927 , -0.55600548]])

In [66]:
names == 'Bob'


array([ True, False, False,  True, False, False, False], dtype=bool)

In [67]:
# matches the row with above True-False and picks only the True ones
data[names == 'Bob', 2:]

array([[-0.59535177, -0.41227647],
       [-1.44300344,  0.67693852]])

In [68]:
data[names == 'Bob', 3]

array([-0.41227647,  0.67693852])

In [69]:
# To select everything but Bob
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True], dtype=bool)

In [70]:
# or you can use -
data[-(names == 'Bob')]

TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~` operator or the logical_not function instead.

#### Note: Selecting data from an array by boolean indexing always create a copy of the data

In [71]:
# you can use & and | for boolean expressions
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False], dtype=bool)

In [72]:
data[mask]

array([[ 0.33991316,  0.44655472, -0.59535177, -0.41227647],
       [ 0.29466714, -1.08029722,  0.80006734,  1.3247759 ],
       [ 0.6508215 , -1.62242832, -1.44300344,  0.67693852],
       [ 0.1940658 ,  0.10991205, -1.97036678, -1.88849743]])

#### Note: keywords and/or do not work with boolean arrays

In [73]:
# setting all negative values in array daat to zero
data[data < 0] = 0
data

array([[ 0.33991316,  0.44655472,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.29611074,  1.29011785],
       [ 0.29466714,  0.        ,  0.80006734,  1.3247759 ],
       [ 0.6508215 ,  0.        ,  0.        ,  0.67693852],
       [ 0.1940658 ,  0.10991205,  0.        ,  0.        ],
       [ 1.26790042,  0.        ,  1.21188182,  0.13016965],
       [ 0.        ,  0.55513399,  0.        ,  0.        ]])

### Fancy Indexing

In [74]:
arr = np.empty((8, 4))
for i in range(len(arr)):
    arr[i] = i
arr

array([[ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.]])

In [75]:
# fancy indexing
# picks complete row of each element of the list
arr[[4, 3, 0, 6]]

array([[ 4.,  4.,  4.,  4.],
       [ 3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.],
       [ 6.,  6.,  6.,  6.]])

In [76]:
# array length - 1 is the last row
arr[[-3, -5, -7]]

array([[ 5.,  5.,  5.,  5.],
       [ 3.,  3.,  3.,  3.],
       [ 1.,  1.,  1.,  1.]])

In [77]:
# reshape being introduced here
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [78]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [79]:
# another fancy indexing
# intersection of rows and columns in order
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [80]:
# if you want for each column get all rows listed use np.ix_ function
#arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
# or use below
arr[[1, 5, 7, 2],:]

array([[ 4,  5,  6,  7],
       [20, 21, 22, 23],
       [28, 29, 30, 31],
       [ 8,  9, 10, 11]])

#### Note: Fancy indexing, unlike slicing always copies the data into a new array

### Transporting Arrays and Swapping Axes

In [81]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [82]:
# transpose of an array which is a view of the array
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [83]:
arr = np.array([[1, 2], [3, 4]])
arr

array([[1, 2],
       [3, 4]])

In [84]:
# matrix multiplication
arr.dot(arr)

array([[ 7, 10],
       [15, 22]])

In [85]:
# or you can type
np.dot(arr, arr)

array([[ 7, 10],
       [15, 22]])

In [86]:
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)

array([[  1.907123  ,  -0.17736628,  -0.78911568],
       [ -0.17736628,   7.80041607,  -2.18620899],
       [ -0.78911568,  -2.18620899,  15.01126596]])

In [87]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [88]:
# transpose permutes the axes. It axes start from 0, 1 ... depending to dimension of the array
# following means transpose the rows and columns
arr.transpose(1, 0)

array([[1, 4],
       [2, 5],
       [3, 6]])

In [89]:
arr = np.arange(16).reshape((2, 2, 4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [90]:
# following means keep the last index intact but change the first index with second one
# to understand what is happening use Aijk and play with keeping k as before but changing i and j
arr.transpose(1, 0, 2)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [91]:
# swap axes works like transpose but gets a pair of axes to swap
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [92]:
arr.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

### Universal Functions: Fast Element-wise Array Functions

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [93]:
arr = np.arange(10)
# unary universal function of sqrt
np.sqrt(arr)

array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

In [94]:
# unary universal function of exponent
np.exp(arr)

array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03])

In [95]:
x = np.random.randn(8)
y = np.random.randn(8)
x

array([ 0.85224899, -0.43204118, -0.96615741,  0.37576571,  1.04377382,
       -0.67098303, -1.03100844, -0.61089428])

In [96]:
y

array([-0.38083686,  0.40664521,  0.04526993,  0.55768849, -1.67615334,
        0.51485889, -1.64223063, -0.77763869])

In [97]:
# binary universal function of maximum (compares element by element in order)
np.maximum(x, y)

array([ 0.85224899,  0.40664521,  0.04526993,  0.55768849,  1.04377382,
        0.51485889, -1.03100844, -0.61089428])

In [98]:
arr = np.random.randn(8)
# modf returns two array as a tuple, one is fractional and one integral part of numbers
np.modf(arr)

(array([-0.20502742, -0.20801467,  0.92214999, -0.88735799,  0.10819737,
         0.07633621,  0.84432361, -0.07882151]),
 array([-3., -0.,  0., -0.,  1.,  1.,  0., -0.]))

#### Some unary ufuncs (Please refer to PyNum documentation for the explanation of each)

abs, fabs<br />
sqrt<br />
square<br />
exp<br />
log, log10, log2, log1p<br />
sign<br />
ceil<br />
floor<br />
rint<br />
modf<br />
isnan<br />
isfinite, isinf<br />
cos, cosh, sin, sinh<br />
tan, tanh<br />
arccos,arccosh, arcsin<br />
arcsinh, arctan, arctanh<br />
logical_not<br />

#### Some binary ufuncs (Please refer to PyNum documentation for the explanation of each)

add<br />
subtract<br />
multiply<br />
divide, floor_divide<br />
power<br />
maximum, fmax<br />
minimum, fmin<br />
mod<br />
copysign<br />
greater, greater_equal<br />
less, less_equal, equal<br />
not_equal<br />
logical_and<br />
logical_or<br />
logical_xor<br />

### Data Processing Using Arrays


Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that 
might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly 
referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders
of magnitude faster than their pure Python equivalents.

In [99]:
# lets say you want to calculate the function sqrt(x^2 + y^2) across a reqular grid of values.
# np.meshgrid function takes two 1D array and produces two 2D, look at following example and see how
points = np.arange(0, 10, 2)
points

array([0, 2, 4, 6, 8])

In [100]:
xs, ys = np.meshgrid(points, points)
xs

array([[0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8]])

In [101]:
ys

array([[0, 0, 0, 0, 0],
       [2, 2, 2, 2, 2],
       [4, 4, 4, 4, 4],
       [6, 6, 6, 6, 6],
       [8, 8, 8, 8, 8]])

In [102]:
z= np.sqrt(xs ** 2 + ys ** 2)
z

array([[  0.        ,   2.        ,   4.        ,   6.        ,   8.        ],
       [  2.        ,   2.82842712,   4.47213595,   6.32455532,
          8.24621125],
       [  4.        ,   4.47213595,   5.65685425,   7.21110255,
          8.94427191],
       [  6.        ,   6.32455532,   7.21110255,   8.48528137,  10.        ],
       [  8.        ,   8.24621125,   8.94427191,  10.        ,  11.3137085 ]])

### Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y

In [103]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
# zip() is built in Python function and makes an iterator that aggregates elements from each of the iterables.
result = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
result

[1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]

This has multiple problems. <br />
First, it will not be very fast for large arrays. (Pure Python) <br />
Second, it will not works with multidimensional arrays. <br />
With np.where you can write: <br />

In [104]:
result = np.where(cond, xarr, yarr)
result

array([ 1.1,  2.2,  1.3,  1.4,  2.5])

In [105]:
# The second or third arguments of where function; one or both of them can be scalars.
arr = np.random.randn(4,4)
arr

array([[-1.16922932,  0.8027948 ,  0.78022146,  1.53374021],
       [-1.36444738,  0.62813455,  1.67596873, -0.47679472],
       [ 0.07780301, -0.07070325,  0.37573902, -1.52072758],
       [ 0.8587546 , -0.51361337, -0.742135  ,  0.38358102]])

In [106]:
# we want to replace all positive values with 2 and all negative values with -2
np.where(arr > 0, 2, -2)

array([[-2,  2,  2,  2],
       [-2,  2,  2, -2],
       [ 2, -2,  2, -2],
       [ 2, -2, -2,  2]])

In [107]:
# or setting only positive values to 2
np.where(arr > 0, 2, arr)

array([[-1.16922932,  2.        ,  2.        ,  2.        ],
       [-1.36444738,  2.        ,  2.        , -0.47679472],
       [ 2.        , -0.07070325,  2.        , -1.52072758],
       [ 2.        , -0.51361337, -0.742135  ,  2.        ]])

In [108]:
''' 
Consider following example where we have two boolean arrays, cond1 and cond2 and wish to assign
a different value for each of he 4 possible pairs of boolean values.
Pure Pythin:
'''
cond1 = np.array([True, True, False, False])
cond2 = np.array([True, False, True, False])

result = []
for i in range(len(cond1)):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)
result       

[0, 1, 2, 3]

In [109]:
# smart way of using np.where
np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))

array([0, 1, 2, 3])

In [110]:
# values of zero treated as False and non-zero True in Python
# so we can re-write previous code as:
result = 1 * (cond1 & -cond2) + 2 * (-cond1 * cond2) + 3 * (-cond1 * -cond2)
result

TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~` operator or the logical_not function instead.

### Mathematical and Statistical Methods

In [111]:
arr = np.random.randn(5, 4)
arr

array([[ -1.23161509e+00,  -7.47167828e-01,  -3.34071690e-02,
         -7.63297455e-01],
       [ -7.25405494e-01,  -5.69683481e-01,   9.29035233e-01,
         -7.80464267e-01],
       [ -3.92083178e-01,   1.93218323e+00,   9.74852292e-01,
         -1.60434810e+00],
       [  1.16166812e+00,  -3.18036080e-01,   5.35049738e-01,
          1.87839153e-03],
       [ -9.40189807e-01,   2.57922366e-01,   1.03930263e+00,
         -2.10149087e+00]])

In [112]:
arr.mean()

-0.16876484071768635

In [113]:
arr.sum()

-3.3752968143537272

In [114]:
arr.std()

1.0001838257224211

In [115]:
arr = np.array([[1, 2, 3],[4, 5, 6], [7, 8, 9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [116]:
# mean on axis - 0 is row and 1 is column for two dimension array
arr.mean(0)

array([ 4.,  5.,  6.])

In [117]:
arr.mean(axis = 1)

array([ 2.,  5.,  8.])

In [118]:
arr.sum(axis = 0)

array([12, 15, 18])

In [119]:
# Cumulative sum - starting from zero as sum
arr.cumsum(axis = 0)

array([[ 1,  2,  3],
       [ 5,  7,  9],
       [12, 15, 18]], dtype=int32)

#### Basic array statistical methods

In [120]:
# cumulative product - starting from one as product
arr.cumprod(axis = 1)

array([[  1,   2,   6],
       [  4,  20, 120],
       [  7,  56, 504]], dtype=int32)

sum<br />
mean<br />
std, var<br />
min, max<br />
argmin, argmax (Indices of minimum and maximum elements, respectively. By default, the index is for the flattened array)<br />
cumsum<br />
cumprod<br />

In [121]:
arr.min(axis = 0)

array([1, 2, 3])

In [122]:
# max index for flattened array
arr.argmax()

8

#### Methods for Boolean Arrays

boolean values are coerced to 1 (True) and 0 (False).

In [123]:
arr = np.random.randn(10)
arr

array([-1.28746661, -0.79571106,  0.60299865, -1.63847279, -0.05239923,
       -1.18049572,  0.88347491,  1.03529102,  1.09275135,  2.03862716])

In [124]:
arr > 0

array([False, False,  True, False, False, False,  True,  True,  True,  True], dtype=bool)

In [125]:
(arr > 0).sum()

5

In [126]:
# any() method retrun True if any element is True
bool = np.array([False, False, True, False])
bool.any()

True

In [127]:
# all() method return True if all elements are True
bool.all()

False

#### Sorting

In [None]:
arr = np.random.randn(10)
arr

In [128]:
arr.sort()
arr

array([-1.63847279, -1.28746661, -1.18049572, -0.79571106, -0.05239923,
        0.60299865,  0.88347491,  1.03529102,  1.09275135,  2.03862716])

In [129]:
arr = np.random.randn(3, 4)
arr

array([[-0.93754504, -0.3544121 ,  0.66438123,  0.65214082],
       [ 0.66347928,  1.62414868, -0.1019461 , -0.38990672],
       [ 0.90248581, -0.0160722 ,  0.18878971,  0.07363357]])

In [130]:
arr.sort(axis = 0)
arr

array([[-0.93754504, -0.3544121 , -0.1019461 , -0.38990672],
       [ 0.66347928, -0.0160722 ,  0.18878971,  0.07363357],
       [ 0.90248581,  1.62414868,  0.66438123,  0.65214082]])

In [131]:
arr.sort(axis = 1)
arr

array([[-0.93754504, -0.38990672, -0.3544121 , -0.1019461 ],
       [-0.0160722 ,  0.07363357,  0.18878971,  0.66347928],
       [ 0.65214082,  0.66438123,  0.90248581,  1.62414868]])

In [132]:
# finding 5% quantile
large_array = np.random.randn(1000)
large_array.sort()
large_array[int(0.05 * len(large_array))]

-1.7416073336401103

####  Unique and Other Set Logic

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

In [None]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

In [None]:
# putting in set to remove the duplicates
sorted(set(names))
#names.sort()
#names

In [None]:
# compute a boolean array indicating whether each element of x is contained in y
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6]) 

In [None]:
# compute the sorted union of element
np.union1d(values, [200, 100])

In [None]:
# compute the sorted , common elements
np.intersect1d(values, [3, 2])

In [None]:
# set differencce, elements in first set but not the second one
np.setdiff1d(values, [0, 6, 10])

In [None]:
# set symmetric difference, elements that are in either of the arrays but not both
np.setxor1d(values, [0, 6, 10])


### File Input and Output With Arrays

#### Storing Arrays on Disk in Binary Format

In [None]:
# saving the array in binary format on a file
# saved as .npy extension
arr = np.arange(10)
np.save('some_array', arr)

In [None]:
# loading the array in binary format from a file
arrloaded = np.load('some_array.npy')
arrloaded

In [None]:
# we can save multiple arrays in a zip archive using np.savez and passing the arrays as keyword arguments
np.savez('array_archive.npz', a = arr, b = arr)
archive = np.load('array_archive.npz')
archive['a']

In [None]:
archive['b']

#### Saving and Loading Text Files

We will discuss this using read_csv and read_table panda library but you can refer to np.loadtxt and np.savetxt (Panda librray is better and less confusing)

### Linear Algebra

In [None]:
# example of matrix multiplication
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])
np.dot(x, y)

In [None]:
# the same
x.dot(y)

In [None]:
# numpy.linalg has standard set of matrix decomposition and things like inverse and determinant
from numpy.linalg import inv, qr
X = np.random.randn(5,5)
# T for transpose
mat = X.T.dot(X)
# inverse of teh matrix
inv(mat)

In [None]:
# It should give you Identity matrix
mat.dot(inv(mat))

#### Commonly-used numpy.linalg functions

diag  (return the diagonal of the matrix) <br />
dot   (multiplication) <br />
trace (main diagonal sum) <br />
det   (determinent) <br />
eig   (AV = EV) <br />
inv   (inverse) <br />
qr    (QR decomposition) <br />
svd   (singular value decomposition) <br />
solve (solve Ax = b for x, where A is a square matrix) <br />
 
    

In [None]:
mat.trace()

### Random Number Generation

In [133]:
# np.random supplements the built-in Python random with functions for efficiency
# for example a 4 by 4 array of samples from standard normal distribution
samples = np.random.normal(size=(4,4))
samples

array([[ -2.85359342e-01,  -5.93157236e-01,   4.39983455e-01,
         -3.49360905e-01],
       [ -1.77589942e-01,  -6.03027220e-04,   1.81414779e-01,
         -6.91842559e-01],
       [ -2.76457213e-01,  -4.31252454e-02,   8.75293107e-01,
         -3.29350232e-01],
       [ -6.75788748e-01,  -4.44267574e-01,   7.48699209e-01,
          1.27172616e+00]])

#### Some of numpy.random functions

seed        &nbsp;&nbsp;&nbsp;&nbsp;(seed the random number generator) <br />
permutation &nbsp;&nbsp;&nbsp;&nbsp;(return a random permutation) <br />
shuffle     &nbsp;&nbsp;&nbsp;&nbsp;(randomly permute a sequence in place) <br />
rand        &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from a uniform distribution) <br />
randint     &nbsp;&nbsp;&nbsp;&nbsp;(draw random integers from a given low-to-high range) <br />
randn       &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from a normal distribution with mean 0 and standard deviation 1) <br /> 
binominal   &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from binominal distribution) <br />
normal      &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from normal (Gaussian) distribution) <br />
beta        &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from beta distribution) <br />
chisquare   &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from a chi-square distribution) <br />
gamma       &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from gamma distribution) <br />
uniform     &nbsp;&nbsp;&nbsp;&nbsp;(draw samples from a uniform [0, 1) distribution)