##### Numpy - Numerical python 

is a fundamental package for high performance scientific computing and data analysis. 

it provides the following:

1. ndarray - a fast and space efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities.

2. standard mathematical functions for fast operations on entire arrays of data without having to write loops.

3. Tools for reading/writing array data to disk and working with memory-mapped files.

4. Linear algebra, random number generation and Fourier transform capabilities

5. Tools for integrating code written in C, C++ and Fortran. it provides easy to use C API, it is easy to pass data to external libraries written in low level languages and also for external libraries to return data to python as NumPy arrays.

In [4]:
import numpy as np
print(len(dir(np)))
print(dir(np))

622


##### we'll focus on the following:

• Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations

• Common array algorithms like sorting, unique, and set operations

• Efficient descriptive statistics and aggregating/summarizing data

• Data alignment and relational data manipulations for merging and joining together heterogeneous data sets

• Expressing conditional logic as array expressions instead of loops with if-elif  else branches

• Group-wise data manipulations (aggregation, transformation, function application).

##### numpy ndarray: A multidimensional array object
    
ndarray is a generic multidimensional container for homogenous data, elements are of same type. 

Every array has a shape, dtype and dimensions(rows*columns)

In [31]:
#creating ndarrays usng array and asarray

data1 = [6,7.4, 8, 0, 1, 7 , 4, -5]
arr1 = np.array(data1)
print(arr1)
arr1 = np.asarray(data1)
print(arr1)

[ 6.   7.4  8.   0.   1.   7.   4.  -5. ]
[ 6.   7.4  8.   0.   1.   7.   4.  -5. ]


In [7]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [11]:
print(arr1.ndim, arr2.ndim)
print(arr1.shape, arr2.shape)
print(arr1.dtype, arr2.dtype)

1 2
(8,) (2, 4)
float64 int32


In [16]:
#In addtion to np.array, there are other ways to create arrays.
a= np.zeros(5)
b = np.ones(5)
print(a)
print(b)


[0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1.]


In [25]:
#np.zeros and np.ones creates arrays of 0's and 1's
print(np.zeros((3,6)))
print(np.ones((1,4)))

#np.empty creates an array without initializing its values to any particular value
print(np.empty((2,3,2)))

[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]
[[1. 1. 1. 1.]]
[[[0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]]]


In [28]:
#arange is an array-valued version of the builtin python range function

np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

#### Table 4-1. Array creation functions:

Function --> Description

array --> Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default.

asarray --> Convert input to ndarray, but do not copy if the input is already an ndarray

arange --> Like the built-in range but returns an ndarray instead of a list.

ones, ones_like --> Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.

zeros, zeros_like --> Like ones and ones_like but producing arrays of 0’s instead 

empty, empty_like --> Create new arrays by allocating new memory, but do not populate with any values like ones and zeros

eye, identity --> Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)

In [30]:
np.asarray(a)

array([0., 0., 0., 0., 0.])

#### Datatypes for ndarrays

datatype or dtype is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data.


Type --> Type -->Code Description

int8, uint8--> i1, u1 -->Signed and unsigned 8-bit (1 byte) integer types

int16, uint16 -->i2, u2 --> Signed and unsigned 16-bit integer types

int32, uint32 -->i4, u4 -->Signed and unsigned 32-bit integer types

int64, uint64 -->i8, u8 -->Signed and unsigned 32-bit integer types

float16 -->f2 -->Half-precision floating point

float32 -->f4 or f --> Standard single-precision floating point. Compatible with C float

float64, float128 --> f8 or d -->Standard double-precision floating point. Compatible with C double and Python float object

float128 --> f16 or g --> Extended-precision floating point

complex64/128/256 -->c8, c16, c32 --> Complex numbers represented by two 32, 64, or 128 floats, respectively

bool --> ? --> Boolean type storing True and False values

object --> O --> Python object type

string_ --> S --> Fixed-length string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10'.

unicode_ --> U --> Fixed-length unicode type (number of bytes platform 
specific). Same specification semantics as string_ (e.g. 'U10')

In [34]:
arr1 = np.array([1,2,3],dtype = np.float64)
arr2 = np.array([1,2,3],dtype = np.int32)
print(arr1.dtype, arr2.dtype)

float64 int32


In [42]:
#convert or cast an array from one dtype to another using astype method:

arr = np.array([1,2,3,4,5])
print(arr.dtype)
print(arr)

arr = arr.astype(np.float64)
print(arr.dtype)
print(arr)

arr = arr.astype(np.string_)
print(arr.dtype)
print(arr)

arr = arr.astype(np.complex64)
print(arr.dtype)
print(arr)

arr = arr.astype(np.int32)
print(arr.dtype)
print(arr)

int32
[1 2 3 4 5]
float64
[1. 2. 3. 4. 5.]
|S32
[b'1.0' b'2.0' b'3.0' b'4.0' b'5.0']
complex64
[1.+0.j 2.+0.j 3.+0.j 4.+0.j 5.+0.j]
int32
[1 2 3 4 5]




#### Operations between arrays and scalars

Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise while Operations between different sized arrays is called broadcasting

In [43]:
arr = np.array([[1.,2.,3.],[4.,5.,6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [44]:
arr*arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [45]:
1/arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [46]:
arr*.5

array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ]])

In [47]:
arr+arr

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

##### basic indexing and slicing

There are many ways to select subset of data or individual elements.

In [54]:
arr = np.arange(10)
print(arr)
print(arr[5])
print(arr[5:8])

arr[5:8]=12
print(arr[5:8])
print(arr)

arr_slice = arr[5:8]
arr_slice[1] = 12345

print(arr)
print(arr_slice[:])

[0 1 2 3 4 5 6 7 8 9]
5
[5 6 7]
[12 12 12]
[ 0  1  2  3  4 12 12 12  8  9]
[    0     1     2     3     4    12 12345    12     8     9]
[   12 12345    12]


In [60]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr2d[2])
print(arr2d[2][2])
print(arr2d[1][0])
print(arr2d[1,0])

[7 8 9]
9
4
4


In [83]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3d.shape)
print()
print(arr3d)
print()
print(arr3d[0])
print()
print(arr3d[0][0])
print(arr3d[0][0][0])
print(arr3d[0][0][1])
print(arr3d[0][0][2])
print(arr3d[0][1][0])
print(arr3d[1][1][0])
print(arr3d[1][1][2])

arr3d[0][1]=42
arr3d[0][0]=32
print(arr3d)

(2, 2, 3)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]

[[1 2 3]
 [4 5 6]]

[1 2 3]
1
2
3
4
10
12
[[[32 32 32]
  [42 42 42]]

 [[ 7  8  9]
  [10 11 12]]]


In [112]:
print(arr)
print()
print(arr[1:6])
print()
print(arr2d)
print()
print(arr2d[:2])
print()
print(arr2d[:2,:1])
print()
print(arr2d[:,:1])
print()
print(arr2d[:,2:3])
print()
print(arr2d[1:2,:])
print()
print(arr2d[2:3,:])
print()
print(arr2d[:2,1:])
print()
print(arr2d[:2,0:2])

[    0     1     2     3     4    12 12345    12     8     9]

[ 1  2  3  4 12]

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[[1 2 3]
 [4 5 6]]

[[1]
 [4]]

[[1]
 [4]
 [7]]

[[3]
 [6]
 [9]]

[[4 5 6]]

[[7 8 9]]

[[2 3]
 [5 6]]

[[1 2]
 [4 5]]


##### Boolean indexing

Let’s consider an example where we have some data in an array and an array of names with duplicates. I’m going to use here the randn function in numpy.random to generate some random normally distributed data:


In [121]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = np.random.randn(7,4)
print(names)
data

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']


array([[-1.14478997, -0.43065902,  0.63283902, -0.65900513],
       [ 0.49324263, -1.41013173,  0.12403513, -0.60476185],
       [ 0.55761862,  0.10438399, -0.46908344,  0.00314616],
       [-0.90946192, -0.07282316,  0.44594104,  0.03058698],
       [ 0.72219361,  2.09738358,  2.01712753, -1.35343537],
       [-0.95701322,  0.4130192 ,  0.36190586,  1.77949328],
       [-0.41289075, -0.01789793, -0.17786204,  0.99859043]])

Suppose each name corresponds to a row in the data array. If we wanted to select all the rows with corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:

In [122]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

This boolean array can be passed when indexing the array:

In [123]:
data[names=='Bob']

array([[-1.14478997, -0.43065902,  0.63283902, -0.65900513],
       [-0.90946192, -0.07282316,  0.44594104,  0.03058698]])

The boolean array must be of the same length as the axis it’s indexing. You can even mix and match boolean arrays with slices or integers 

In [124]:
data[names == 'Bob',2:]

array([[ 0.63283902, -0.65900513],
       [ 0.44594104,  0.03058698]])

In [125]:
data[names == 'Bob',3]

array([-0.65900513,  0.03058698])

To select everything except 'Bob', we can use != or negate the condition using ~

In [126]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [127]:
data[names != 'Bob']

array([[ 0.49324263, -1.41013173,  0.12403513, -0.60476185],
       [ 0.55761862,  0.10438399, -0.46908344,  0.00314616],
       [ 0.72219361,  2.09738358,  2.01712753, -1.35343537],
       [-0.95701322,  0.4130192 ,  0.36190586,  1.77949328],
       [-0.41289075, -0.01789793, -0.17786204,  0.99859043]])

In [129]:
data[~(names=='Bob')]

array([[ 0.49324263, -1.41013173,  0.12403513, -0.60476185],
       [ 0.55761862,  0.10438399, -0.46908344,  0.00314616],
       [ 0.72219361,  2.09738358,  2.01712753, -1.35343537],
       [-0.95701322,  0.4130192 ,  0.36190586,  1.77949328],
       [-0.41289075, -0.01789793, -0.17786204,  0.99859043]])

Selecting two of the three names to combine multiple boolean conditions, use boolean arithmetic operators like & (and) and | (or):

In [130]:
mask = (names == 'Bob') | (names == 'Will')
data[mask]

array([[-1.14478997, -0.43065902,  0.63283902, -0.65900513],
       [ 0.55761862,  0.10438399, -0.46908344,  0.00314616],
       [-0.90946192, -0.07282316,  0.44594104,  0.03058698],
       [ 0.72219361,  2.09738358,  2.01712753, -1.35343537]])

To set all of the negative values in data to 0 we need only do:

In [132]:
data[data<0]=0
data

array([[0.        , 0.        , 0.63283902, 0.        ],
       [0.49324263, 0.        , 0.12403513, 0.        ],
       [0.55761862, 0.10438399, 0.        , 0.00314616],
       [0.        , 0.        , 0.44594104, 0.03058698],
       [0.72219361, 2.09738358, 2.01712753, 0.        ],
       [0.        , 0.4130192 , 0.36190586, 1.77949328],
       [0.        , 0.        , 0.        , 0.99859043]])

Setting whole rows or columns using a 1D boolean array is also easy:

In [133]:
data[names != 'Joe']=7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.49324263, 0.        , 0.12403513, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.4130192 , 0.36190586, 1.77949328],
       [0.        , 0.        , 0.        , 0.99859043]])

##### Fancy indexing

indexing using integer arrays

In [136]:
arr = np.empty((8,4))
arr

array([[1.02829041e-311, 1.10176639e-321, 0.00000000e+000,
        0.00000000e+000],
       [6.23034601e-307, 1.16096346e-028, 1.04857803e-142,
        1.16467185e-028],
       [1.20269400e-153, 6.01356675e-154, 6.01347002e-154,
        1.50102763e-153],
       [6.35229961e-067, 4.56335201e-072, 1.79936126e-153,
        4.10115910e+223],
       [2.99068648e-067, 4.58546060e-072, 1.21691798e+132,
        4.10114587e+223],
       [1.05394313e+141, 1.79938770e-153, 2.69379802e+132,
        8.48939199e+136],
       [5.54481851e-048, 1.21691798e+132, 9.69638060e+140,
        6.19343989e-071],
       [8.49078911e+136, 4.10123873e+223, 4.10114638e+223,
        9.11160086e+130]])

In [137]:
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

To select subset of the rows in a particular order, simply pass a list or ndarray of integers specifying desired order and use negate to select rows from the end.

In [138]:
arr[[4,3,0,6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [139]:
arr[[~3,~5,~7]]

array([[4., 4., 4., 4.],
       [2., 2., 2., 2.],
       [0., 0., 0., 0.]])

passing multiple index arrays selects 1D array of elements corresponding to each tuple of indices

In [140]:
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [141]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

In [142]:
#the elements (1, 0), (5, 3), (7,1), and (2, 2) were selected

arr[[1,5,7,2]][:,[0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Another way is to use np.ix_ function which converts two ID integer arrays to an indexer that selects the square region:

In [144]:
arr[np.ix_([1,5,7,2],[0,3,1,2])]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

##### Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping which similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute:

In [153]:
arr = np.arange(15).reshape((3,5))
print(arr)
print(arr.shape)
print()
print(arr.T)
print(arr.T.shape)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)

[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]
(5, 3)


In [157]:
#computing inner matrix product XTX using np.dot
#when 3x6 multiplied 6x3 matrix, resultant is 3x3 matrix

arr = np.random.randn(6,3)
print(arr)
print()
print(np.dot(arr.T, arr))

[[ 0.63210882  0.19047005 -0.53151694]
 [-1.34476668  0.0580737   0.94630352]
 [-1.35602863  0.35554762 -1.86545799]
 [-0.50559412 -0.49445567 -0.0747253 ]
 [ 1.17142721 -0.16391396  0.29058546]
 [-0.31983558 -0.08757797 -0.67494055]]

[[ 5.77693455 -0.35383938  1.51513082]
 [-0.35383938  0.4450896  -0.66111461]
 [ 1.51513082 -0.66111461  5.20350265]]


for higher dimensional arrays, transpose will accept a tuple of axis numbers to permute the axes

In [159]:
arr = np.arange(16).reshape((2,2,4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [160]:
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

ndarray has method swapaxes which takes a pair of axis numbers

In [163]:
print(arr)
print(arr.shape)

[[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]]
(2, 2, 4)


In [167]:
print(arr.swapaxes(1,2))
c =arr.swapaxes(1,2)
c.shape

[[[ 0  4]
  [ 1  5]
  [ 2  6]
  [ 3  7]]

 [[ 8 12]
  [ 9 13]
  [10 14]
  [11 15]]]


(2, 4, 2)

##### universal Array functions

ufunc or universal function is a function that performs elementwise operations on data in ndarrays. these are fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results

some of the ufunc are: sqrt, exp, modf, maximum, randn etc

In [181]:
arr = np.arange(10)
print(np.sqrt(arr))
print()
print(np.exp(arr))
print()

x=np.random.randn(8)
y = np.random.randn(8)
print(x)
print()
print(y)
print()
print(np.maximum(x,y))
print()
print(np.minimum(x,y))
print()

[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]

[1.00000000e+00 2.71828183e+00 7.38905610e+00 2.00855369e+01
 5.45981500e+01 1.48413159e+02 4.03428793e+02 1.09663316e+03
 2.98095799e+03 8.10308393e+03]

[ 0.60945621  1.08698139 -0.57719436 -0.84568874 -2.26114071  0.82047832
  0.4691058  -0.59480544]

[-0.79578356 -0.10041103  0.70434806 -0.39202691 -0.38540209 -2.22253461
  0.17528971  1.17548793]

[ 0.60945621  1.08698139  0.70434806 -0.39202691 -0.38540209  0.82047832
  0.4691058   1.17548793]

[-0.79578356 -0.10041103 -0.57719436 -0.84568874 -2.26114071 -2.22253461
  0.17528971 -0.59480544]



In [180]:
arr = np.random.randn(7)*5
np.modf(arr)#returns fractional and integral part of floating point array

(array([-0.49665824,  0.72819658, -0.39525518,  0.11124541,  0.66727522,
        -0.27474784, -0.1155083 ]), array([-3.,  8., -0.,  2.,  1., -2., -2.]))

##### Table 4-3. Unary ufuncs

Function Description

    abs, fabs --> Compute the absolute value element-wise for integer, floating   point, or complex values.Use fabs as a faster alternative for non-complex-valued data

    sqrt -->Compute the square root of each element. Equivalent to arr ** 0.5 square Compute the square of each element. Equivalent to arr ** 2

    exp -->Compute the exponent ex of each element
    
    log, log10, log2, log1p --> Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively

    sign -->Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative)

    ceil -->Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element

    floor -->Compute the floor of each element, i.e. the largest integer less than or equal to each element

    rint -->Round elements to the nearest integer, preserving the dtype

    modf -->Return fractional and integral parts of array as separate array

    isnan -->Return boolean array indicating whether each value is NaN (Not a Number)

    isfinite, isinf -->Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively

    cos, cosh, sin, sinh, tan, tanh --> Regular and hyperbolic trigonometric functions
    
    arccos, arccosh, arcsin, arcsinh, arctan, arctanh -->Inverse trigonometric functions
    
    logical_not --> Compute truth value of not x element-wise. Equivalent to -arr.

##### Table 4-4. Binary universal functions

Function Description

    add -->Add corresponding elements in arrays
    
    subtract -->Subtract elements in second array from first array

    multiply -->Multiply array elements

    divide, floor_divide -->Divide or floor divide (truncating the remainder)

    power -->Raise elements in first array to powers indicated in second array

    maximum, fmax -->Element-wise maximum. fmax ignores NaN

    minimum, fmin -->Element-wise minimum. fmin ignores NaN

    mod -->Element-wise modulus (remainder of division)

    copysign -->Copy sign of values in second argument to values in first argument

    greater, greater_equal, less, less_equal, equal,not_equal --> Perform element-wise comparison, yielding boolean array. Equivalent to infix operators
>, >=, <, <=, ==, !=

    logical_and,logical_or, logical_xor -->Compute element-wise truth value of logical operation. Equivalent to infix operators & |, ^

### Data processing using Arrays

##### Expressing conditional logic as Array operations

numpy where function is a vectorized version of ternary expression x if condition else y

In [193]:
xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True, False, True, True, False])

suppose we wanted to take a value from xarr whenever corresponding value in cond is True otherwise take value of yarr

In [194]:
res = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
res

[1.1, 2.2, 1.3, 1.4, 2.5]

This approach will be slower for large arrays. it will not work in ndim arrays. Hence np.where can be used 

In [204]:
res = np.where(cond, xarr, yarr)
res

array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with -2.

This is very easy to do with np.where:

In [205]:
arr = np.random.randn(4,4)
arr

array([[ 0.20705167, -1.64968722, -0.01526411,  1.06398242],
       [-0.06868531,  0.23358623,  0.435645  , -0.05574931],
       [-0.85665139,  1.24170389, -0.77377435,  0.35366519],
       [ 0.28914666, -0.11310177,  0.72362003, -1.56257062]])

In [206]:
np.where(arr > 0,2,-2)

array([[ 2, -2, -2,  2],
       [-2,  2,  2, -2],
       [-2,  2, -2,  2],
       [ 2, -2,  2, -2]])

In [207]:
np.where(arr > 0, 2, arr)#set only positive values to 2

array([[ 2.        , -1.64968722, -0.01526411,  2.        ],
       [-0.06868531,  2.        ,  2.        , -0.05574931],
       [-0.85665139,  2.        , -0.77377435,  2.        ],
       [ 2.        , -0.11310177,  2.        , -1.56257062]])

##### Mathematical and stastistical methods

A set of mathematical functions which compute statistics about an entire array or about the data along an axis are accessible as array methods. Aggregations (often called reductions) like sum, mean, and standard deviation std can either be used by calling the array instance method or using the top level NumPy function:

In [208]:
arr = np.random.randn(5,4)
print(arr.mean())
print(np.mean(arr))

0.005241598828765704
0.005241598828765704


In [209]:
print(arr.sum())
print(np.sum(arr))

0.10483197657531407
0.10483197657531407


In [210]:
print(arr.mean(axis=1))
print(arr.sum(axis=0))

[ 0.11464304  0.1055471   0.28550506 -0.57475193  0.09526472]
[ 0.37614803  1.43132703  3.62156102 -5.3242041 ]


other methods like cumsum and cumprod do not aggregate instead producing an array of the intermediate results

In [211]:
arr = np.array([[0,1,2],[3,4,5],[6,7,8]])

In [212]:
arr.cumsum(0)#axis=0

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [213]:
arr.cumprod(1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

##### Table 4-5. Basic array statistical methods

Method Description

    sum 
    Sum of all the elements in the array or along an axis. Zero-length arrays have sum 0.

    mean 
    Arithmetic mean. Zero-length arrays have NaN mean.

    std, var 
    Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator n).

    min, max 
    Minimum and maximum.

    argmin, argmax 
    Indices of minimum and maximum elements, respectively.

    cumsum 
    Cumulative sum of elements starting from 0

    cumprod 
    Cumulative product of elements starting from 1

#### Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False) in the above methods. Thus, sum is often used as a means of counting True values in a boolean array:

In [216]:
arr = np.random.randn(100)
print(arr)
(arr>0).sum()#No of positive values

[ 0.53997781  0.00557718  0.35984111  1.45981631  0.67945613  0.86526588
 -0.06257031  0.19391499  0.26350424  0.16653239 -2.26139625  0.03222884
  0.38772186 -0.02920015  2.07055088  0.41144041  0.18263354  1.27595074
 -1.36344268  1.12433909 -0.5311669  -0.44578708 -0.06532811 -0.17672435
  0.2464132   0.63560808 -0.82559619 -0.48477349 -1.24075968 -0.84990056
 -1.35572017 -0.05210918  0.15583447 -0.6824746   0.49759473  1.74805482
  0.73867788 -0.34483352 -0.3305697   0.14231644  0.10836185  0.46766455
 -1.07693063 -1.39410524  0.96539107 -1.83756928  0.77419284 -1.38322299
 -0.68181225  0.94070244 -0.36381538  1.3382129   1.23746081 -0.50638834
  0.35967382  1.12625359  0.33722728  0.20767591 -1.35591847  0.43420214
  0.57423948 -0.01364882  0.00550492  0.80191666 -0.61298246  1.03062479
 -1.81161318  1.66821431  0.9486599  -0.93817414  0.09581623  0.5024277
 -0.69554945 -0.08160561  0.24005434 -0.34653952 -0.11679419  0.59006543
 -1.12877794  1.18999999  0.33116704 -0.64938849 -0.

55

There are two additional methods any and all. any tests whether one or more values in an array is True, while all checks if every value is True. These values also work with non-boolean arrays, where non-zero elements evaluate to True.

In [219]:
bools = np.array([False, False, True, False, True])
print(bools.any(), bools.all())

True False


##### sorting - 

Numpy arrays can be sorted in-place using the sort method:

In [223]:
arr = np.random.randn(8)
print(arr)
arr.sort()
print(arr)

[ 2.88911986 -3.4669274  -0.16846677  0.12726595 -1.87548577  0.52623416
 -1.72633136  0.99908648]
[-3.4669274  -1.87548577 -1.72633136 -0.16846677  0.12726595  0.52623416
  0.99908648  2.88911986]


Multidimensional arrays can have each 1D section of values sorted in-place along an axis by passing the axis number to sort

In [225]:
arr = np.random.randn(5,3)
arr

array([[ 2.20229398,  0.92701456,  1.32242046],
       [-0.15579816, -1.27105076, -1.7206375 ],
       [-0.31295484,  1.79582849,  0.05030871],
       [ 0.12422866,  0.86983156, -0.8900745 ],
       [ 1.00874294, -0.19937848,  0.09760879]])

In [226]:
arr.sort()
arr

array([[ 0.92701456,  1.32242046,  2.20229398],
       [-1.7206375 , -1.27105076, -0.15579816],
       [-0.31295484,  0.05030871,  1.79582849],
       [-0.8900745 ,  0.12422866,  0.86983156],
       [-0.19937848,  0.09760879,  1.00874294]])

In [228]:
arr.sort(0)
arr

array([[-1.7206375 , -1.27105076, -0.15579816],
       [-0.8900745 ,  0.05030871,  0.86983156],
       [-0.31295484,  0.09760879,  1.00874294],
       [-0.19937848,  0.12422866,  1.79582849],
       [ 0.92701456,  1.32242046,  2.20229398]])

##### Unique and other Set Logic

NumPy has some basic set operations for one-dimensional ndarrays. Probably the most commonly used one is np.unique, which returns the sorted unique values in an array

In [229]:
names = np.array(['Bob','Joe','Will','Bob','Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [231]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

array([1, 2, 3, 4])

Another function, np.in1d, tests membership of the values in one array in another, returning a boolean array

In [232]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values,[2,3,6])

array([ True, False, False,  True,  True, False,  True])

##### Table 4-6. Array set operations

Method Description

unique(x) -->Compute the sorted, unique elements in x
    
intersect1d(x, y) -->Compute the sorted, common elements in x and y
    
union1d(x, y)--> Compute the sorted union of elements
    
in1d(x, y) -->Compute a boolean array indicating whether each element of x is contained in y
    
setdiff1d(x, y) -->Set difference, elements in x that are not in y
    
setxor1d(x, y)--> Set symmetric differences; elements that are in either of the arrays, but not both

#### File input and Output with Arrays

##### Storing Arrays on disk in binary format

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk.  Arrays are saved by default in an uncompressed raw binary format with file extension .npy

In [233]:
arr = np.array(10)
np.save('some_array', arr)

In [234]:
np.load('some_array.npy')

array(10)

You save multiple arrays in a zip archive using np.savez and passing the arrays as keyword arguments

In [235]:
np.savez('array_archive.npz', a=arr, b=arr)

when loading an .npz file, you get back a dict like ibject which loads the individual arrays lazily

In [240]:
arch = np.load('array_archive.npz')
arch['b']
arch['a']

array(10)

##### linear algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. As such, there is a function dot, both an array method, and a function in the numpy namespace, for matrix multiplication

In [248]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
print(x.shape, y.shape)

(2, 3) (3, 2)


In [249]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [250]:
np.dot(x,y)

array([[ 28.,  64.],
       [ 67., 181.]])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant.

In [253]:
from numpy.linalg import inv, qr

In [257]:
mat = x.T.dot(x)
print(x)
print(mat)

[[1. 2. 3.]
 [4. 5. 6.]]
[[17. 22. 27.]
 [22. 29. 36.]
 [27. 36. 45.]]


In [260]:
mat.dot

<function ndarray.dot>

###### Commonly-used numpy.linalg functions

Function Description

diag  -->
Return the diagonal (or off-diagonal) elements of a square matrix as a 1D 
array, or convert a 1D array into a squarematrix with zeros on the off-diagonal

dot -->
Matrix multiplication

trace 
Compute the sum of the diagonal elements

det -->
Compute the matrix determinant

eig -->
Compute the eigenvalues and eigenvectors of a square matrix

inv -->
Compute the inverse of a square matrix

pinv -->
Compute the Moore-Penrose pseudo-inverse inverse of a square matrix

qr -->
Compute the QR decomposition

svd -->
Compute the singular value decomposition (SVD)

solve -->
Solve the linear system Ax = b for x, where A is a square matrix

lstsq -->
Compute the least-squares solution to y = Xb

###### Random number generation

The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

In [268]:
import numpy as np
samples = np.random.normal(size=(4,4))
samples

Wall time: 0 ns


array([[-0.06097868, -1.185394  ,  2.3819669 ,  0.07939891],
       [-0.24642082, -0.89811377,  2.6130259 , -1.31773516],
       [-0.73057064, -1.06289558,  0.3258592 , -0.35788548],
       [-0.7112134 , -0.05170353, -2.00733875, -1.12437114]])

In [267]:
np.random.randn(4,4)

array([[-0.20938781, -0.93137306, -0.006859  ,  0.33010637],
       [ 0.19876352, -1.22029143,  0.7130936 , -0.79188754],
       [ 0.17739572, -0.34655827, -0.11462757,  0.06021681],
       [ 1.34763149, -0.61490366, -1.1541473 ,  1.35417391]])

##### Partial list of numpy.random functions

Function Description

seed ->Seed the random number generator

permutation ->Return a random permutation of a sequence, or return a permuted range

shuffle ->Randomly permute a sequence in place

rand ->Draw samples from a uniform distribution

randint ->Draw random integers from a given low-to-high range

randn ->Draw samples from a normal distribution with mean 0 and standard 
deviation 1 (MATLAB-like interface)

binomial ->Draw samples a binomial distribution

normal ->Draw samples from a normal (Gaussian) distribution

beta ->Draw samples from a beta distribution

chisquare ->Draw samples from a chi-square distribution

gamma ->Draw samples from a gamma distribution

uniform ->Draw samples from a uniform [0, 1) distribution