* Stands for Numerical Python. Useful for scientific computing which involves numerical data.
* Provides, 
    - A fast and efficient multi-dimensional array called **ndarray**. Which has fast array oriented arithmetic operations and flexible broadcasting capabilities.
    - Mathematical function for fast operation on entire array without writing loops.
    -
    - Function for performing element wise computation with arrays or mathematical operation between array.
    - Tools for reading and writing array based datasets to disk.
    - Linear algebra operations, Fourier transform and random number generator.
* ndarray is used to pass data between libraries and algorithms. It is more efficient for storing and manipulating data than built-in python data structures. NumPy libraries are written in lowe level languages like C, so we can directly manipulate data without copying to other memory representation.

#### Interacting with outside world
* Reading writing with variety of file format and data stores.

#### Preparation
* Cleaning, munging, combining, normalizing, reshaping, slicing, dicing and transforming data for analysis

#### Transformation
* Apply mathematical and statistical operations to groups of datasets to derive new datasets

#### Modeling and computation
* Connecting your data to statistical models, machine learning algorithms or other computational tools

#### presentation
* Creating interactive visualization or textual summaries.

### Why NumPy is efficient with large arrays of data?
- NumPy internally stores data in contiguous block of memory, independent of other built in python objects.
- NumPy library algorithm written in C language that can operate on this memory without type checking or other overhead. NumPy array uses much less space than other python built-in types.

In [2]:
import numpy as np

In [3]:
my_arr = np.arange(1000000)
my_lst = list(range(1000000))

In [5]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 25.9 ms


In [6]:
%time for _ in range(10): my_lst2 = [x * 2 for x in my_lst]

Wall time: 861 ms


# ndarray

* Generic multidimensional container for homogeneous data. Element must be of same type. Every array has properties like shape (tuple indicating size of each dimension), dtype (data type of an array)

In [7]:
data = np.random.randn(2,3)

In [8]:
data

array([[ 1.41628744,  0.05148484, -0.934345  ],
       [-0.31061226,  0.44927702,  0.54621797]])

In [9]:
data * 10

array([[14.16287437,  0.51484838, -9.34345002],
       [-3.10612258,  4.49277017,  5.46217967]])

In [10]:
data + data

array([[ 2.83257487,  0.10296968, -1.86869   ],
       [-0.62122452,  0.89855403,  1.09243593]])

In [11]:
data.shape

(2, 3)

In [12]:
data.dtype

dtype('float64')

## Creating ndarray
### `array`
* Accepts any sequence like object and produce new NumPy array containing the passed data.

In [13]:
lst = [6, 7.5, 8, 0, 1]

In [14]:
arr1 = np.array(lst)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

In [15]:
lst2 = [[4,5,2,4], [6,2,5,4]]
lst2

[[4, 5, 2, 4], [6, 2, 5, 4]]

In [17]:
arr2 = np.array(lst2)

In [18]:
arr2

array([[4, 5, 2, 4],
       [6, 2, 5, 4]])

In [19]:
arr2.ndim

2

In [20]:
arr2.shape

(2, 4)

In [29]:
arr2.dtype

dtype('int32')

In [30]:
arr3 = np.array([1,2,3], dtype = np.float64)

In [31]:
arr3

array([1., 2., 3.])

In [32]:
np.array([1,2,3], dtype = np.int32)

array([1, 2, 3])

* Specifying `dtype` explicitly is useful when we are working with stream of data.
* List of data types in NumPy.
![dtype](images/dtype.jpg)

### `zeros`
* Creates array of 0s of given length and shape.

In [22]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

### `ones`
* Create array of 1s of given length and shape

In [24]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [25]:
np.ones((3,6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

### `empty`
* Creates array without initializing values to any particular value.

In [27]:
np.empty((3,6,2))

array([[[1.86918699e-306, 1.69121096e-306],
        [1.95819673e-306, 7.56587585e-307],
        [1.37961302e-306, 1.05699242e-307],
        [1.95821439e-306, 2.22522597e-306],
        [1.60220528e-306, 8.45596650e-307],
        [1.11258854e-306, 1.60218491e-306]],

       [[1.11260348e-306, 1.11261027e-306],
        [9.34607074e-307, 7.56598449e-307],
        [4.22786781e-307, 1.27947519e-307],
        [3.00398891e-307, 1.05698138e-307],
        [1.27947264e-307, 4.45033446e-307],
        [3.11522393e-307, 3.00396515e-307]],

       [[1.02356521e-306, 1.78021391e-306],
        [8.06635958e-308, 6.89810244e-307],
        [1.22387550e-307, 1.24610927e-306],
        [5.33590898e-322, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

### `asarray`
* Covert input to ndarray, but do not copy if input is already ndarray.

### `arange`
* Like the built-in range but return ndarray.

### `once_like`
* Takes another array and produces array of ones of the same shape and dtype.

### `zeros_like`
* Takes another array and produces array of zeros of the same shape and dtype.

### `empty_like`
* Takes another array and produces array of the same shape and dtype without initializing values.

### `full`
* Produce an array of given shape and dtype with all value set to indicated fill value

### `full_like`
* Takes another array and produces array of given value of the same shape and dtype.

### `eye` `identity`
* Create a square N X N identity matrix.

### Casting an array `astype`
* Cast an array from one dtype to other.

In [33]:
arr5 = np.array([1,2,3,4,5])

In [34]:
arr5.dtype

dtype('int32')

In [35]:
float_arr5 = arr5.astype(np.float64)

In [36]:
float_arr5

array([1., 2., 3., 4., 5.])

In [37]:
float_arr5.astype(np.int32)

array([1, 2, 3, 4, 5])

* If casting fails `ValueError` will be raised.
* Calling `astype` always creates new array, even if the new dtype is the same as the old type.

### Arithmetic with NumPy array.
* Vectorization: without writing loops we can do operation on array.
* Any arithmetic operation on equal size array applies the operation element wise.
* Arithmetic operation with scalar propagate to each element in array.
* Comparing the array of same size yield boolean array.
* Operation with different size array are called broadcasting.

In [43]:
arr6 = np.array([[1.,2.,3.,4.],[5.,6.,7.,8.]])

In [44]:
arr6

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

In [45]:
arr7 = np.array([[0.,4.,2.,6.], [1.,6.,4.,6.]])

In [46]:
arr7

array([[0., 4., 2., 6.],
       [1., 6., 4., 6.]])

In [47]:
arr6 * 2

array([[ 2.,  4.,  6.,  8.],
       [10., 12., 14., 16.]])

In [49]:
1 / arr6

array([[1.        , 0.5       , 0.33333333, 0.25      ],
       [0.2       , 0.16666667, 0.14285714, 0.125     ]])

In [50]:
arr6 + arr7

array([[ 1.,  6.,  5., 10.],
       [ 6., 12., 11., 14.]])

In [51]:
arr6 > arr7

array([[ True, False,  True, False],
       [ True, False,  True,  True]])

### Basic indexing and slicing
* Select subset of your data or individual element.

In [52]:
arr5

array([1, 2, 3, 4, 5])

In [53]:
arr5[3]

4

In [55]:
arr5[2:4]

array([3, 4])

In [56]:
arr5[2:4] = 2

In [57]:
arr5

array([1, 2, 2, 2, 5])

* Assigning scalar value to slice, it will be propagated (broadcasted) to entire selection.
* Array slice are just view of the original array. Meaning data is not copied and any modification to the view will be reflected in the source array.

In [58]:
arr5[:]

array([1, 2, 2, 2, 5])

* To copy the data instead of view,

In [61]:
arr5[1:4].copy()

array([2, 2, 2])

In [62]:
arr9 = np.array([[1,2,3], [4,5,6], [7,8,9]])

In [63]:
arr9

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [64]:
arr9[1]

array([4, 5, 6])

In [65]:
arr9[1][2]

6

In [66]:
arr9[1,2]

6

In [67]:
arr10 = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])

In [68]:
arr10

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [69]:
arr10[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [70]:
arr10[0] = 25

In [71]:
arr10

array([[[25, 25, 25],
        [25, 25, 25]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [72]:
arr10[1,0]

array([7, 8, 9])

In [73]:
arr10[0:]

array([[[25, 25, 25],
        [25, 25, 25]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [74]:
arr10[1,1]

array([10, 11, 12])

In [76]:
arr10[1,1,1:]

array([11, 12])

### Boolean Indexing

In [77]:
names = np.array(['Bob', 'Joe', 'Will', 'Joe', 'Joe', 'Bob'])

In [78]:
arr11 = np.random.randn(6, 4)

In [79]:
arr11

array([[ 0.29426643,  0.38626286,  0.23899999, -0.09576053],
       [-0.76820038,  0.45928024,  0.30015451,  0.35330379],
       [ 0.57308591, -1.07344487,  0.08729667,  0.54744926],
       [-1.55876137, -0.238978  ,  0.80911292, -0.03028328],
       [ 1.26821051,  0.75057994,  0.91632546,  0.55843202],
       [ 0.07123081, -0.59537382,  1.06253369,  0.45504423]])

In [80]:
names == 'Bob'

array([ True, False, False, False, False,  True])

In [81]:
arr11[names == 'Bob']

array([[ 0.29426643,  0.38626286,  0.23899999, -0.09576053],
       [ 0.07123081, -0.59537382,  1.06253369,  0.45504423]])

* Boolean array must be of the same length as array index which it is indexing.

In [82]:
arr11[names == 'Joe', 2:]

array([[ 0.30015451,  0.35330379],
       [ 0.80911292, -0.03028328],
       [ 0.91632546,  0.55843202]])

* `~` is used to invert general condition

In [85]:
arr11[~(names == 'Bob')]

array([[-0.76820038,  0.45928024,  0.30015451,  0.35330379],
       [ 0.57308591, -1.07344487,  0.08729667,  0.54744926],
       [-1.55876137, -0.238978  ,  0.80911292, -0.03028328],
       [ 1.26821051,  0.75057994,  0.91632546,  0.55843202]])

In [86]:
arr11[(names == 'Bob') | (names == 'Joe')]

array([[ 0.29426643,  0.38626286,  0.23899999, -0.09576053],
       [-0.76820038,  0.45928024,  0.30015451,  0.35330379],
       [-1.55876137, -0.238978  ,  0.80911292, -0.03028328],
       [ 1.26821051,  0.75057994,  0.91632546,  0.55843202],
       [ 0.07123081, -0.59537382,  1.06253369,  0.45504423]])

* Selecting data from an array by boolean indexing ALWAYS creates a copy of the data, even if returned array is unchanged. Python `and`, `or` will NOT work here, use `&`, `|`, `~`.

In [88]:
arr11[arr11 < 0] = 0

In [89]:
arr11

array([[0.29426643, 0.38626286, 0.23899999, 0.        ],
       [0.        , 0.45928024, 0.30015451, 0.35330379],
       [0.57308591, 0.        , 0.08729667, 0.54744926],
       [0.        , 0.        , 0.80911292, 0.        ],
       [1.26821051, 0.75057994, 0.91632546, 0.55843202],
       [0.07123081, 0.        , 1.06253369, 0.45504423]])

In [90]:
arr11[names == 'Joe'] = 7

In [91]:
arr11

array([[0.29426643, 0.38626286, 0.23899999, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.57308591, 0.        , 0.08729667, 0.54744926],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.07123081, 0.        , 1.06253369, 0.45504423]])

### Fancy Indexing
* Indexing using integer array

In [92]:
arr12 = np.empty((8,4))

In [93]:
for i in range(8):
    arr12[i] = i

In [94]:
arr12

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

* To select subset of row in particular order

In [95]:
arr12[[4,3,0,6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

### `reshape`

In [96]:
arr13 = np.arange(32).reshape((8,4))

In [97]:
arr13

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [101]:
arr13[[1,5,7,2]]

array([[ 4,  5,  6,  7],
       [20, 21, 22, 23],
       [28, 29, 30, 31],
       [ 8,  9, 10, 11]])

* Fancy index ALWAYS copy data into new array.

### Transpose arrays and swapping axes

* Returns view of underlying data without copying anything.

In [102]:
arr12.T

array([[0., 1., 2., 3., 4., 5., 6., 7.],
       [0., 1., 2., 3., 4., 5., 6., 7.],
       [0., 1., 2., 3., 4., 5., 6., 7.],
       [0., 1., 2., 3., 4., 5., 6., 7.]])

In [104]:
arr14 = np.arange(16).reshape((2,2,4))
arr14

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [106]:
arr14.transpose((1, 0, 2))
# Axees reordered with second axes first, first axes second, last axes unchanged.

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

### Inner matrix product `np.dot`

In [103]:
np.dot(arr12, arr12.T)

array([[  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.],
       [  0.,   4.,   8.,  12.,  16.,  20.,  24.,  28.],
       [  0.,   8.,  16.,  24.,  32.,  40.,  48.,  56.],
       [  0.,  12.,  24.,  36.,  48.,  60.,  72.,  84.],
       [  0.,  16.,  32.,  48.,  64.,  80.,  96., 112.],
       [  0.,  20.,  40.,  60.,  80., 100., 120., 140.],
       [  0.,  24.,  48.,  72.,  96., 120., 144., 168.],
       [  0.,  28.,  56.,  84., 112., 140., 168., 196.]])

### `swapaxes`
* Takes pair of axes numbers and switches the indicated axes to rearrange data.
* Returns view of data.

In [107]:
arr14

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [108]:
arr14.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

### Universal Functions

In [109]:
arr15 = np.arange(10)

In [110]:
arr15

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [111]:
np.sqrt(arr15)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [112]:
np.exp(arr15)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [115]:
arr16 = np.random.randn(10)

In [116]:
arr16

array([ 1.13765361, -0.26637203,  0.55097738, -0.16156892,  0.94079048,
        1.65857647,  1.11492225, -0.42036902,  0.12325915,  0.15138486])

In [117]:
np.maximum(arr15, arr16) # element wise maximum

array([1.13765361, 1.        , 2.        , 3.        , 4.        ,
       5.        , 6.        , 7.        , 8.        , 9.        ])

### `modf`
* Returns the fractional and integral part of a floating point array

In [118]:
arr16 = arr16 * 5

In [119]:
arr16

array([ 5.68826804, -1.33186013,  2.75488689, -0.80784458,  4.70395238,
        8.29288236,  5.57461124, -2.10184509,  0.61629577,  0.75692429])

In [120]:
remainder, whole = np.modf(arr16)

In [121]:
remainder

array([ 0.68826804, -0.33186013,  0.75488689, -0.80784458,  0.70395238,
        0.29288236,  0.57461124, -0.10184509,  0.61629577,  0.75692429])

In [122]:
whole

array([ 5., -1.,  2., -0.,  4.,  8.,  5., -2.,  0.,  0.])

![unary function list](images/unary.jpg)
![unary function list](images/binary.jpg)

### Conditional logic as Array operation

In [127]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5, 1.6])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5, 2.7])
cond = (names == 'Joe')
cond

array([False,  True, False,  True,  True, False])

In [130]:
result = [x if c else y for x, y, c in zip(xarr, yarr, cond)]

In [131]:
result

[2.1, 1.2, 2.3, 1.4, 1.5, 2.7]

* Above syntax will not efficient on large array. Second, it will not work with multidimensional array.

### `where`
* Produce new array of values based on another array.

In [134]:
result = np.where(cond, xarr, yarr)

In [135]:
result

array([2.1, 1.2, 2.3, 1.4, 1.5, 2.7])

In [136]:
arr17 = np.random.randn(4,4)

In [137]:
arr17

array([[-1.07347985,  1.69614859, -0.321126  ,  1.20429087],
       [-0.43869145,  1.83777457, -0.5098406 ,  0.7492085 ],
       [-1.43127715, -1.22140879, -0.78070367, -0.71630547],
       [ 0.83951961,  0.69358869, -2.18963284,  0.32657169]])

In [139]:
np.where(arr17 > 0, 2, -2) # replace negative value with -2 and positive with 2

array([[-2,  2, -2,  2],
       [-2,  2, -2,  2],
       [-2, -2, -2, -2],
       [ 2,  2, -2,  2]])

In [141]:
np.where(arr17 > 0, 2, arr17) # only replace +ve value with 2

array([[-1.07347985,  2.        , -0.321126  ,  2.        ],
       [-0.43869145,  2.        , -0.5098406 ,  2.        ],
       [-1.43127715, -1.22140879, -0.78070367, -0.71630547],
       [ 2.        ,  2.        , -2.18963284,  2.        ]])

### Statistical methods

In [142]:
arr18 = np.random.randn(5,4)

In [143]:
arr18

array([[-0.88160602, -1.35364929,  1.37395697, -1.50672548],
       [-1.51938775, -0.43125686, -0.51644009, -0.7657337 ],
       [-0.96073927,  0.21907154, -1.58418052,  0.65411158],
       [-0.58445448, -0.43516783,  0.4092958 , -0.43817255],
       [-0.84604174,  0.31782062,  0.97009223,  1.35738876]])

In [144]:
arr18.mean()

-0.3260909032685605

In [145]:
np.mean(arr18)

-0.3260909032685605

In [146]:
arr18.sum()

-6.52181806537121

* We can supply optional axis argument to compute statistics over that axis.

In [147]:
arr18.mean(axis = 1) # compute mean accross the column

array([-0.59200596, -0.8082046 , -0.41793416, -0.26212476,  0.44981497])

In [148]:
arr18.sum(axis = 0) # compute sum down the rows

array([-4.79222926, -1.68318181,  0.6527244 , -0.69913139])

### `cumsum`, `cumprod`

In [149]:
arr19 = np.arange(8)

In [150]:
arr19

array([0, 1, 2, 3, 4, 5, 6, 7])

In [151]:
arr19.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In [153]:
arr20 = np.array([[0,1,2],[3,4,5],[6,7,8]])

In [154]:
arr20

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [155]:
arr20.cumsum(axis = 0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [157]:
arr20.cumprod(axis = 1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

![stat function list](images/stat_method.jpg)


### Methods for boolean array

In [159]:
arr21 = np.random.randn(100)

In [160]:
(arr21 > 0).sum() # number of positive value

48

In [163]:
arr22 = np.array([False, True, False, False, True])

In [164]:
arr22.all() # Does all value is True

False

In [166]:
arr22.any() # Does any value is True

True

### Sorting
* in-place

In [167]:
arr23 = np.random.randn(6)

In [168]:
arr23.sort()

In [169]:
arr23

array([-1.6896255 , -1.5693796 , -0.59079231,  0.53718812,  0.75804368,
        1.30487954])

In [170]:
arr24 = np.random.randn(5,3)

In [171]:
arr24

array([[ 0.67812169, -1.26181789, -0.28693387],
       [ 0.14558359, -0.18203356, -0.37587802],
       [-0.30296841,  1.2520618 , -0.03685637],
       [ 0.53468557,  1.05052261, -0.81958013],
       [ 2.0982914 , -0.40243891, -1.18833012]])

In [172]:
arr24.sort(1)

In [173]:
arr24

array([[-1.26181789, -0.28693387,  0.67812169],
       [-0.37587802, -0.18203356,  0.14558359],
       [-0.30296841, -0.03685637,  1.2520618 ],
       [-0.81958013,  0.53468557,  1.05052261],
       [-1.18833012, -0.40243891,  2.0982914 ]])

* `np.sort(arr)` return copy of sorted array 

### `unique`
* Returns sorted unique value of array

In [175]:
names

array(['Bob', 'Joe', 'Will', 'Joe', 'Joe', 'Bob'], dtype='<U4')

In [176]:
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [177]:
sorted(set(names))

['Bob', 'Joe', 'Will']

### `np.in1d`
* Test membership of values in one array in another, returning boolean array.

In [178]:
values = np.array([6,0,0,3,2,5,6])
np.in1d(values, [2,3,6])

array([ True, False, False,  True,  True, False,  True])

![set function list](images/set.jpg)


### `np.save`, `np.load`
* Save and load array data on disk. Array are saved by default in uncompressed raw binary format with file extension `.npy`

In [180]:
arr25 = np.arange(10)

In [181]:
arr25

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [182]:
np.save('some_array', arr25)

In [183]:
np.load('some_array.npy')

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

* To store multiple arrays in an uncompressed archive using `np.savez` and passing array as keyword argument.

In [185]:
np.savez('array_archive.npz', a = arr25, b = arr24)

* By loading archive, we get back dict-like object that load individual array easily.

In [186]:
arch = np.load('array_archive.npz')

In [187]:
arch['a']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [188]:
arch['b']

array([[-1.26181789, -0.28693387,  0.67812169],
       [-0.37587802, -0.18203356,  0.14558359],
       [-0.30296841, -0.03685637,  1.2520618 ],
       [-0.81958013,  0.53468557,  1.05052261],
       [-1.18833012, -0.40243891,  2.0982914 ]])

* If data can be compressed, use `np.savez_compressed`

### Linear Algebra

In [190]:
arr26 = np.random.randn(3,5)
arr27 = np.random.randn(5,3)

In [192]:
arr26.dot(arr27) # Matrix multiplication

array([[ 1.02742322,  1.69412475, -2.37566705],
       [-0.90826934,  0.32952574,  0.82008409],
       [-1.38679912, -0.2889322 ,  0.79263054]])

In [194]:
arr26 @ arr27 # Also performs matrix multiplication

array([[ 1.02742322,  1.69412475, -2.37566705],
       [-0.90826934,  0.32952574,  0.82008409],
       [-1.38679912, -0.2889322 ,  0.79263054]])

### `numpy.linalg`
* Contains standard matrix operation.



In [195]:
from numpy.linalg import inv, qr

In [196]:
arr28 = np.random.randn(5,5)

In [197]:
arr29 = arr28.T.dot(arr28)

In [198]:
inv(arr29)

array([[ 2.62845833, -4.83733975,  2.63326725,  4.03883974, -0.44681823],
       [-4.83733975, 10.58965788, -5.37057107, -8.29284009,  1.1885627 ],
       [ 2.63326725, -5.37057107,  3.23855356,  4.3401482 , -0.73977015],
       [ 4.03883974, -8.29284009,  4.3401482 ,  6.85401388, -0.88360248],
       [-0.44681823,  1.1885627 , -0.73977015, -0.88360248,  0.30110078]])

![matrix function list](images/matrix.jpg)


### Random number generation

#### Standard normal distribution `random.normal`

In [205]:
arr30 = np.random.normal(size = (4,4))

In [206]:
arr30

array([[ 1.02918214,  0.13560179, -0.14121589,  0.69445818],
       [-0.01784018,  1.18517322,  0.48112515,  0.28709836],
       [ 0.42746948,  0.98962949,  1.39922147,  1.1506415 ],
       [-0.89208229,  0.2358664 , -2.08270477,  0.62644147]])

In [232]:
height = np.random.normal(1.75,0.20,5000) # distribution mean, SD, numbers of samples

* Python's built-in `random` module, only samples one value at a time

* To change seed of numpy's random number generator.

In [207]:
np.random.seed(1234)

![random function list](images/random.jpg)


### How ndarray can show view of data without copying anything?
- It has striding information that enables the array to move through varying steps.
- ndarray internally has following
    - pointer to data
    - dtype
    - tuple indicating shape
    - A tuple of strides, integers indicating number of bytes to step in order to advance one element along a dimension

In [208]:
np.ones((3,4,5), dtype = np.float64).strides

(160, 40, 8)

### `np.integer` `np.floating`
* Superclass to other numpy types.

In [209]:
ints = np.ones(10, dtype = np.uint16)

In [210]:
floats = np.ones(10, dtype = np.float32)

In [211]:
np.issubdtype(ints.dtype, np.integer)

True

In [212]:
np.issubdtype(floats.dtype, np.floating)

True

### `mro`
* To check all parent type of specific dtype

In [213]:
np.float64.mro()

[numpy.float64,
 numpy.floating,
 numpy.inexact,
 numpy.number,
 numpy.generic,
 float,
 object]

### Reshaping arrays `reshape`

In [216]:
ints = np.arange(10)

In [217]:
ints.reshape((5,2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [218]:
ints.reshape((5,2), order = 'C') # Row major, default

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [219]:
ints.reshape((5,2), order = 'F') # Column major

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

In [220]:
ints.reshape((5,-1)) # One of the dimension can be -1 that will be inferred from data.

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

### flattening and raveling

In [221]:
arr31 = np.arange(15).reshape((5,3))

In [222]:
arr31

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [223]:
arr31.ravel() # Does not produce copy if underlying values are contiguous in the original array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [224]:
arr31.flatten() # Always return copy

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Concatenating and splitting arrays
* `np.concatenate` takes a sequence of arrays and joins them together in order along the input axes.

In [225]:
arr32 = np.array([[1,2,3], [4,5,6]])
arr33 = np.array([[7,8,9],[10,11,12]])

In [227]:
np.concatenate([arr32, arr33], axis = 0)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [228]:
np.concatenate([arr32, arr33], axis = 1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [230]:
np.hstack((arr32, arr33))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [231]:
np.vstack((arr32, arr33))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])