# NumPy Basics

### Why NumPy is so efficient
#### 1. Internally stores data in contiguous block of memory, so easy to retrieve.
#### Has alogrithm library written in C that can operate on this memory without any type-checking or overhead.
#### NumPy arrays also use much less memory than other built-in Python sequences.
#### 2. NumPy operations perform Complex Computations without need for Python loops.

In [1]:
import numpy as np
my_arr = np.arange(1000000) # NumPy array
my_list = list(range(1000000))

In [2]:
B = np.random.rand(1000)
%timeit sum(B)
%timeit np.sum(B)

183 µs ± 6.45 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
11.7 µs ± 676 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [3]:
# Time for NumPy operation
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: total: 31.2 ms
Wall time: 33 ms


In [4]:
# Time for List operation
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: total: 1.44 s
Wall time: 1.44 s


#### Huge difference in performance time between the 2 (10 to 100 times faster). 

### NumPy ndarray
#### Fast, flexible container for large datasets. Perform mathematical operations on blocks of data.
#### It is a container for homogenous data, all its elements must be of same type.
#### Every array has shape (tuple showing size of each dimension) and dtype (showing data type of the array).

#### Array can be created using tuple also ---->

In [5]:
import numpy as np

arr1= np.array((2,3,4,6,8,9))
type(arr1)

numpy.ndarray

###### Difference when single  element is inserted in list or tuple

In [6]:
a = np.array([1])
a.ndim

1

In [7]:
b = np.array(1) #but with random show size
b.ndim

0

In [9]:
data = np.random.randn(2,3)  #only limited passing
data

array([[-0.22011007,  0.77856939,  0.1881794 ],
       [-0.52248939, -1.33046278,  0.00623953]])

In [10]:
data * 10

array([[ -2.2011007 ,   7.78569393,   1.88179403],
       [ -5.22489391, -13.30462782,   0.06239528]])

In [11]:
data + data

array([[-0.44022014,  1.55713879,  0.37635881],
       [-1.04497878, -2.66092556,  0.01247906]])

In [12]:
data.shape

(2, 3)

In [13]:
data.dtype

dtype('float64')

In [14]:
data.ndim

2

data.shape[0] ---> no of dimen, will give 2 dim

In [15]:
data.shape[0]

2

In [16]:
data.shape[1]

3

In [17]:
a = np.array([ [ [2,3,5,8],[12,5,6,8],[12,5,6,8] ],
              [ [9,3,5,8],[12,5,6,8] ,[12,5,6,8]]] )
a.shape

(2, 3, 4)

In [18]:
a[1,2,0]


12

In [19]:
a.size

24

In [20]:
a.nbyte

AttributeError: 'numpy.ndarray' object has no attribute 'nbyte'

### Different ways to create ndarray
#### 1. array - Will create an array out of a list. For list of lists, will create a higher-dimensional array.
#### 2. empty - Creates array without initializing values. May return uninitialized garbage values.
#### 3. zeros - Creates array initialized with zeroes.
#### 4. arange - Create array initialized with range of values.
#### For empty and zeros, pass the size of the array. For creating multi-dimensional arrays, use tuples.

In [8]:
data1 = [6, 4.5, 0, 8, 1]
arr1 = np.array(data1)
arr1

array([6. , 4.5, 0. , 8. , 1. ])

In [9]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [10]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [11]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [12]:
np.empty((2,3))

array([[6.23042070e-307, 4.67296746e-307, 1.69121096e-306],
       [9.34609111e-307, 1.42413555e-306, 1.78019082e-306]])

In [13]:
np.empty((2,3,1))

array([[[6.23042070e-307],
        [4.67296746e-307],
        [1.69121096e-306]],

       [[9.34609111e-307],
        [1.42413555e-306],
        [1.78019082e-306]]])

In [14]:
np.arange(5)

array([0, 1, 2, 3, 4])

### NumPy Data Types
#### dtype is a special object, containing information (metadata) the ndarray needs to interpret a chunk of memory as a particular data type.
#### It makes numpy flexible to interact with data from other systems. They provide mapping directly to underlying disk or memory representations.
#### This makes it easy to read & write binary streams of data to disk & connect to low-level language code like C.
#### NumPy tries to infer a good data type for any array that it creates.
#### The naming convention is type of data followed by number of bits per element. eg: int23, float64, etc.

In [15]:
arr1 = np.array([1,2,3], dtype = np.float64)
arr1

array([1., 2., 3.])

In [16]:
arr2 = np.array([1,2,3], dtype = np.int32)
arr2

array([1, 2, 3])

In [17]:
arr1.dtype

dtype('float64')

In [18]:
arr2.dtype

dtype('int32')

#### We can convert or cast an array from 1 dtype to another with astype.
#### We can also directly used dtype of another array.
#### But there may be loss off data if we convert larger size dtype to smaller ones.
#### Loos of data can also happen due to the nature of the data eg. string_ type
#### If the casting may fail, then a ValueError will be raised.

In [19]:
arr = np.array([1,2,3,4])
arr.dtype

dtype('int32')

In [20]:
float_arr = arr.astype(np.float64)
float_arr

array([1., 2., 3., 4.])

In [21]:
arr.astype(float_arr.dtype)

array([1., 2., 3., 4.])

In [22]:
arr = np.array([3.4, -1.4, -4.2, 0.4, 10.4])
arr.astype(np.int32)


array([ 3, -1, -4,  0, 10])

In [23]:
strings = np.array(['1.23','-9.6','43'], dtype=np.string_)

strings.astype(float)

array([ 1.23, -9.6 , 43.  ])

## Arithmetic with NumPy arrays
#### Arrays help in expressing batch operations without writing 'for loops'.
#### Arithemetic operations between equal sized arrays applies to element-wise operations.
#### Scalar operations propogate the scalar argument to each element in the array.
#### Comparisons between equal sized arrays yields boolean arrays.
#### Operations between differently sized arrays is called broadcasting.

In [24]:
arr = np.array([[1.,2.,3.],[4.,5.,6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [25]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [26]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [27]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [28]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [29]:
arr2 = np.array([[0., 4., 1.],[7., 2., 12.]])
arr2


array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [30]:
arr2 > arr1

array([[False,  True, False],
       [ True, False,  True]])

## Basic indexing and slicing
#### There are many ways to select a subset of data.
#### 1d arrays are simple and work similarl to Python lists.

In [31]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [32]:
arr[5]

5

In [33]:
arr[5:8]

array([5, 6, 7])

#### You can assign a scalar value to a slice, then that value is broadcasted to entire selection.
#### Difference between lists and ndarrays is that array slices are views to original arrays.
#### Data is NOT copied and modifications to the view will affect original array.

In [34]:
arr[5:7] = 12

In [35]:
arr

array([ 0,  1,  2,  3,  4, 12, 12,  7,  8,  9])

#### A "bare" slice [:] will assign to all values in an array.

In [None]:
arr[:] = 64
arr

#### NumPy has been designed this way to be able to work with very large arrays.
#### If copy to array slices was allowed, then it would cause a lot of performance and memory problems.
#### If you want to make a copy, you have to explicitly mention it.

In [36]:
arr_copy = arr[5:8].copy()    # Note change should not reflect for arr_copy(orig)  changes will made only in arr
arr_copy[:] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12,  7,  8,  9])

In [37]:
arr_copy

array([12, 12, 12])

#### For higher dimension arrays, elements at each index are arrays and not scalars.
#### Individual elements can be accessed recursively or through a comma seperated list.
#### Axis 0 is rows and axis 1 is columns.

In [38]:
arr2d  = np.array([[1,2,3],
                   [4,5,6],
                   [7,8,9]])
arr2d[2]

array([7, 8, 9])

In [39]:
arr2d[0][2]

3

In [40]:
arr2d[0,2]

3

#### If we omit lower indices, we will get all the values that are present in the higher index.
#### We can assign both scalar and array value to higher dimensional array.

In [41]:
arr2d[0]

array([1, 2, 3])

In [42]:
arr_copy

array([12, 12, 12])

In [43]:
# Note change should not reflect for arr_copy(orig) changes will made only in arr2d
arr_copy = arr2d[0].copy()
arr2d[0] = 42
arr2d

array([[42, 42, 42],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [44]:
arr_copy

array([1, 2, 3])

In [45]:
arr2d[0]=arr_copy
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [49]:
b = np.arange(10)
a = b[[2,3,5]]
a

array([2, 3, 5])

In [50]:
a.shape

(3,)

In [None]:
b

### Indexing with Slices
#### ndarrays can be sliced in the same format as Python lists.

In [51]:
arr

array([ 0,  1,  2,  3,  4, 12, 12,  7,  8,  9])

In [52]:
arr[1:6]

array([ 1,  2,  3,  4, 12])

#### Slicing higher dimensional arrays is a bit different.
#### A slice selects elements from the axis 0 or rows.
#### To select multiple axis elements, we need to pass multiple slices.

In [53]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [54]:
arr2d[::-2]   #start from last 

array([[7, 8, 9],
       [1, 2, 3]])

In [55]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [56]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

#### Slicing will always get you array views of the higher arrays.
#### By mixing the integer indexes and slices, you can get lower arrays.

In [57]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [58]:
arr2d[1,:3]

array([4, 5, 6])

In [59]:
arr2d[:3,2]

array([3, 6, 9])

#### Assigning single value to a slice changes all the elements in the slice

In [60]:
arr2d[:2,2] = 0
arr2d

array([[1, 2, 0],
       [4, 5, 0],
       [7, 8, 9]])

### Boolean Indexing
#### Comparisons with Numpy arrays are also vectorized. so they result in a Boolean array.
#### This array can also be passed when indexing the array.

In [61]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names =='Bob'

array([ True, False, False,  True, False, False, False])

In [62]:
names[names != 'Bob']

array(['Joe', 'Will', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [63]:
data = np.random.randn(7,4)
data

array([[ 0.2422531 ,  0.72826379, -1.15730498,  0.00423394],
       [-0.98375305,  0.35150344, -1.66277649,  0.0717737 ],
       [ 1.81277026, -0.43967277,  0.55285786, -1.2738666 ],
       [-0.93701714, -0.55552212,  1.24158919,  0.42169281],
       [ 1.38371949, -2.46213536, -0.2121563 ,  1.36336038],
       [-1.9982444 ,  0.69390918,  1.21239179,  0.25581862],
       [-0.69780783,  0.79432356, -0.84965185,  1.36712477]])

#### The boolean arrays must be of the same length as the array they are indexing.
#### The boolean selection will not fail if the boolean array is not of correct length. So you should be careful when using this feature.
#### You can mix boolean arrays with slices or integers.

In [64]:
data[names == 'Bob', 2:]

array([[-1.15730498,  0.00423394],
       [ 1.24158919,  0.42169281]])

In [65]:
data[names=='Joe']

array([[-0.98375305,  0.35150344, -1.66277649,  0.0717737 ],
       [-1.9982444 ,  0.69390918,  1.21239179,  0.25581862],
       [-0.69780783,  0.79432356, -0.84965185,  1.36712477]])

#### To select outside a condition use the != operator or simply negate the condition using ~.
#### The ~ operator can be usefule when you want to negate an already existing general condition.
#### You can combine multiple conditions with arithematic operators & and |. Normal Python versions 'and' and 'or' do not work here.
#### Boolean indexing always creates a copy of the data, even if it is unchanged.

In [66]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [67]:
data[~(names == 'Bob')]

array([[-0.98375305,  0.35150344, -1.66277649,  0.0717737 ],
       [ 1.81277026, -0.43967277,  0.55285786, -1.2738666 ],
       [ 1.38371949, -2.46213536, -0.2121563 ,  1.36336038],
       [-1.9982444 ,  0.69390918,  1.21239179,  0.25581862],
       [-0.69780783,  0.79432356, -0.84965185,  1.36712477]])

In [68]:
cond = names == 'Bob'
data[~cond]

array([[-0.98375305,  0.35150344, -1.66277649,  0.0717737 ],
       [ 1.81277026, -0.43967277,  0.55285786, -1.2738666 ],
       [ 1.38371949, -2.46213536, -0.2121563 ,  1.36336038],
       [-1.9982444 ,  0.69390918,  1.21239179,  0.25581862],
       [-0.69780783,  0.79432356, -0.84965185,  1.36712477]])

In [69]:
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

In [70]:
data

array([[ 0.2422531 ,  0.72826379, -1.15730498,  0.00423394],
       [-0.98375305,  0.35150344, -1.66277649,  0.0717737 ],
       [ 1.81277026, -0.43967277,  0.55285786, -1.2738666 ],
       [-0.93701714, -0.55552212,  1.24158919,  0.42169281],
       [ 1.38371949, -2.46213536, -0.2121563 ,  1.36336038],
       [-1.9982444 ,  0.69390918,  1.21239179,  0.25581862],
       [-0.69780783,  0.79432356, -0.84965185,  1.36712477]])

In [71]:
data[mask]

array([[ 0.2422531 ,  0.72826379, -1.15730498,  0.00423394],
       [ 1.81277026, -0.43967277,  0.55285786, -1.2738666 ],
       [-0.93701714, -0.55552212,  1.24158919,  0.42169281],
       [ 1.38371949, -2.46213536, -0.2121563 ,  1.36336038]])

#### Boolean indexing is simple because it works in a common-sense way.
#### Setting whole rows and columns using one-dimensional boolean array is also easy.

In [72]:
data[data < 0] = 0
data

array([[0.2422531 , 0.72826379, 0.        , 0.00423394],
       [0.        , 0.35150344, 0.        , 0.0717737 ],
       [1.81277026, 0.        , 0.55285786, 0.        ],
       [0.        , 0.        , 1.24158919, 0.42169281],
       [1.38371949, 0.        , 0.        , 1.36336038],
       [0.        , 0.69390918, 1.21239179, 0.25581862],
       [0.        , 0.79432356, 0.        , 1.36712477]])

In [73]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [74]:
data[names != 'Joe'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.35150344, 0.        , 0.0717737 ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.69390918, 1.21239179, 0.25581862],
       [0.        , 0.79432356, 0.        , 1.36712477]])

### Fancy Indexing
#### It is a term to describe indexing using integer arrays.
#### We can use it to select subset of rows in particular order.
#### Negative indices select rows from the end.

In [75]:
arr = np.empty((8,4))

for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [77]:
arr[[4,3,0,2]]  #rows

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [2., 2., 2., 2.]])

In [78]:
arr[[-3,-5,-7]]  #rows

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

#### Passing mutiple index arrays selects 1d array with each element belonging to a tuple of respective indices.
#### Eg: [[1,5,7,2],[0,3,1,2]] will get elements (1,0), (5,3), (7,1), (2,2)
#### Multiple indices always give a 1d array as a result.
#### To get a rectangular subset, we need to use the indices as subset.
#### Fancy Indexing always results in copy of data.

In [79]:
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [80]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

In [81]:
arr[[1,5,7,2]][:,[0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

In [82]:
arr[[1,5,7,2]][:,[0,3,2,1]]

array([[ 4,  7,  6,  5],
       [20, 23, 22, 21],
       [28, 31, 30, 29],
       [ 8, 11, 10,  9]])

## Transposing and Swapping Axes
#### Transposing - Special form of reshaping. ndarrays have 'transpose' method and special 'T' attribute.
#### 'T' is a special case of swapping axes. Numpy also has 'swapaxes' method, which takes a pair of axis numbers and switches indicated axes to rearrange the data.
#### For getting dot products we can use the np.dot function.
#### For higher arrays, transpose will accept a tuple of axis numbers to permute the axes.
#### All the methods return a view and do not make a copy of the array.

In [83]:
arr = np.arange(15).reshape((3,5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [84]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [85]:
arr.T + arr

ValueError: operands could not be broadcast together with shapes (5,3) (3,5) 

In [86]:
# principal diag must be result matrix size
np.dot(arr, arr.T)   

array([[ 30,  80, 130],
       [ 80, 255, 430],
       [130, 430, 730]])

In [87]:
np.dot(arr.T, arr) #------>wrong   

array([[125, 140, 155, 170, 185],
       [140, 158, 176, 194, 212],
       [155, 176, 197, 218, 239],
       [170, 194, 218, 242, 266],
       [185, 212, 239, 266, 293]])

In [91]:
arr = np.arange(16).reshape((2,2,4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

 (2, 3, 4), and after applying transpose((1, 0, 2)), the transposed array will have a shape of (3, 2, 4)
 
first dimension of the output array will be the second dimension of the original array (1).
second dimension of the output array will be the first dimension of the original array (0).
third dimension remains the same (2).

In [89]:
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [90]:
arr.transpose((2,0,1))

array([[[ 0,  4],
        [ 8, 12]],

       [[ 1,  5],
        [ 9, 13]],

       [[ 2,  6],
        [10, 14]],

       [[ 3,  7],
        [11, 15]]])

In [None]:
arr

In [None]:
arr.swapaxes(1,2)

## Universal Functions
#### Also called ufunc, is a function that performs element-wise operations on data in ndarrays.
#### Can be thought as fast vectorized wrappers for simple functions.
#### Unary ufuncs accept 1 array as input. Binary ufuncs take 2 arrays and return single array as result.

In [92]:
a= np.random.rand(2)
a.ndim

1

In [93]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [95]:
arr1 = np.sqrt(arr).astype(int)

In [96]:
 np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [97]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

np.random.rand is for Uniform distribution (in the half-open interval [0.0, 1.0))
np.random.randn  is for Standard Normal distribution (mean 0 and variance 1)

In [98]:
x = np.random.randn(8)
y = np.random.randn(8)

In [99]:
x

array([ 0.2522826 , -0.10244274,  3.15488533, -0.05143196,  0.99348397,
       -0.16853653,  0.51379458, -0.624834  ])

In [100]:
y

array([ 0.72886595, -0.09522548, -0.12184518,  0.04423491,  0.17801967,
       -1.07626923,  0.05869552,  0.90149055])

In [101]:
np.maximum(x,y)

array([ 0.72886595, -0.09522548,  3.15488533,  0.04423491,  0.99348397,
       -0.16853653,  0.51379458,  0.90149055])

#### 'modf' is one example. It is the Numpy version of Python's divmod.
#### It return fractional and integral parts of a floating point array.
#### Ufuncs have optional 'out' argument. It allows them to operate in-place on arrays.

In [None]:
arr

In [None]:
remainder, whole_part = np.modf(arr)

In [None]:
remainder

In [None]:
whole_part

## Array-Oriented Programming with Arrays
#### We can use Numpy to express data processing tasks as concise array expressions and avoid writing loops.
#### Vectorization is the practice of replacing explicit loops with array expressions.
#### Vectorized array operation will often be 1 or 2 (or more) orders of magnitude faster than Python counterparts.

#### Example
#### Evaluate function sqrt(x^2 + y^2) across regular grid of values.
#### Will use enp.meshgrid to take 2 1-D arrays and produce 2 2-D matrices corresponding to all pairs of (x,y) in the 2 arrays.

In [None]:
#WILL RETURN AN OBJECT TO SAVE SPACE 
range(5)  

In [None]:
#HAVE TO WRITE LIST TO LIST ELEMENT
list(range(5))

In [None]:
# NOTE SINGLE r IN arange(start , stop ,step) ---> for i in range
a= np.arange(100,20,-5)
a

In [None]:
points = np.arange(-5, 5, 0.01)
xs, ys = np.meshgrid(points, points)

In [None]:
ys

#### Now we will use the 2 arrays and use them in the expression.
#### We will use matplotlib to create visualisations of this 2-D array.

In [None]:
z = np.sqrt(xs ** 2 + ys ** 2)
z

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")
plt.imshow(z, cmap=plt.cm.gray);plt.colorbar()


### Expressing Conditional Logic as Array Operations
#### 'numpy.where' is vectorized version of ternary expression "x if condition else y".
#### Example of 3 arrays.

In [104]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

#### We want to take a value from arr or yarr based on the value in cond array.
#### Let's check a Python List comprehension first. 

In [105]:
result = [(x if c else y)
         for x, y, c in zip(xarr, yarr, cond)]
result

[1.1, 2.2, 1.3, 1.4, 2.5]

#### There are multiple problems with this-
#### 1. It will not work for very large arrays as all work is being done in interpreted Python Code.
#### 2. It will not work with Multidimensional arrays.
#### With np.where, this can be done very concisely.

In [106]:
result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

#### Second and Third arguments do not need to be arrays, they can be scalars as well.
#### Typical use is to produce new values based on another array.

In [None]:
a= np.random.randint(9)
a

In [None]:
b= np.random.permutation(10)
b

In [None]:
c= np.random.permutation(np.arange(10))
c

In [107]:
arr = np.random.randn(4,4)
arr

array([[-0.07672637, -0.00651798,  1.19272088,  2.19054592],
       [ 0.13344926, -0.93969161,  0.23196008, -0.54993486],
       [-0.64243684, -1.38798273, -0.12509029, -1.13359634],
       [ 0.50104831,  1.21785559, -0.68075037, -0.42793873]])

In [108]:
d = np.random.randint(20,300)
d


104

In [109]:
arr

array([[-0.07672637, -0.00651798,  1.19272088,  2.19054592],
       [ 0.13344926, -0.93969161,  0.23196008, -0.54993486],
       [-0.64243684, -1.38798273, -0.12509029, -1.13359634],
       [ 0.50104831,  1.21785559, -0.68075037, -0.42793873]])

In [110]:
arr > 0

array([[False, False,  True,  True],
       [ True, False,  True, False],
       [False, False, False, False],
       [ True,  True, False, False]])

In [111]:
np.where(arr > 0, 2, -2)

array([[-2, -2,  2,  2],
       [ 2, -2,  2, -2],
       [-2, -2, -2, -2],
       [ 2,  2, -2, -2]])

In [112]:
np.where(arr > 0, 2, arr)

array([[-0.07672637, -0.00651798,  2.        ,  2.        ],
       [ 2.        , -0.93969161,  2.        , -0.54993486],
       [-0.64243684, -1.38798273, -0.12509029, -1.13359634],
       [ 2.        ,  2.        , -0.68075037, -0.42793873]])

## Mathematical and Statistical methods
#### There are methods in the array class that compute statistics about an entire array or about data along an axis.
#### You can use aggregations (reductions) either by calling array instance method or top-level NumPy function.
#### Eg - sum, mean, std, etc.

In [113]:
arr = np.random.randn(5,4)
arr

array([[ 0.60463974, -1.72143037,  0.28560397,  1.26851791],
       [ 0.85816725, -0.11049596, -0.98698885, -0.49842007],
       [ 1.45043808, -1.03509438, -1.35328222, -0.00199367],
       [ 1.79406209, -0.73842926,  0.10563966, -0.39219891],
       [ 1.1399876 ,  0.52114216, -0.92348846, -0.06175408]])

In [114]:
arr.mean()

0.010231110668776067

In [115]:
np.mean(arr)

0.010231110668776067

In [116]:
arr.sum()

0.20462221337552136

#### Aggregation functions take an optional 'axis' argument that computes statistics over a given axis.
#### Axis 1 means Columns and Axis 0 means Rows.

In [117]:
arr.mean(axis=1)

array([ 0.10933281, -0.18443441, -0.23498305,  0.19226839,  0.16897181])

In [118]:
arr.mean(axis=0)

array([ 1.16945895, -0.61686156, -0.57450318,  0.06283023])

#### Some methods do not aggregateand instead produce array of intermediate results. eg - cumsum, cumprod.
#### In multi-dimensional arrays, these same functions return an array of same size.
#### But they output has partial aggregates computed along the indicated axis.

In [119]:
arr = np.array([0,1,2,3,45,6,7])
arr.cumsum()

array([ 0,  1,  3,  6, 51, 57, 64])

In [120]:
arr = np.array([[0,1,2], [3,4,5], [6,7,8]])
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [121]:
arr.cumsum(axis=0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]])

In [122]:
arr.cumsum(axis=1)

array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21]])

### Methods for Boolean arrays
#### Sum can often be used to count True values in a Boolean array.
#### The method 'any' checks if 1 or more values in an array is True.
#### The method 'all' checks if every value is True.
#### This works with non-Boolean arrays, where non-zero elements evaluate to True.

## Sorting
#### Just like Python's list, ndarrays can also be sorted in-place with sort.
#### For multi-dimensional arrays, we can sort sections along a dimension by passing axis number.
#### np.sort returns a copy of the array instead of sorting in-place.
#### A quick way of getting quantiles is to sort array and select value at a particular rank.

In [123]:
arr2 = np.round(10*np.random.rand(2,3))
arr2

array([[ 7.,  1.,  6.],
       [ 5., 10.,  8.]])

In [124]:
arr2 + 4

array([[11.,  5., 10.],
       [ 9., 14., 12.]])

In [125]:
arr1= np.array([ [2,4,6],
                [4,8,9] ] )
arr1

array([[2, 4, 6],
       [4, 8, 9]])

### This is equivalent to concatenation along the second axis, except for 1-D
arrays where it concatenates along the first axis. Rebuilds arrays divided
by `hsplit`

In [126]:
c= np.hstack((arr1,arr2))
c

array([[ 2.,  4.,  6.,  7.,  1.,  6.],
       [ 4.,  8.,  9.,  5., 10.,  8.]])

In [127]:
arr = np.random.randn(6)
arr

array([-0.71574286, -1.28546917, -0.24715288, -1.85619445,  1.50696415,
       -0.19108297])

In [128]:
arr.sort()
arr

array([-1.85619445, -1.28546917, -0.71574286, -0.24715288, -0.19108297,
        1.50696415])

In [None]:
arr=arr[::-1] #sort in decending order
arr

In [129]:
arr = np.random.randn(5,3)
arr

array([[-0.27128355,  1.24600382,  1.62616815],
       [ 0.61168462, -0.03942427, -0.71657148],
       [-0.51974879,  0.05204405, -0.96055418],
       [-1.86892018, -1.24429494, -0.18702587],
       [ 0.50697995, -1.75552889, -0.33293992]])

In [130]:
arr.sort(1) # axis=-1
arr

array([[-0.27128355,  1.24600382,  1.62616815],
       [-0.71657148, -0.03942427,  0.61168462],
       [-0.96055418, -0.51974879,  0.05204405],
       [-1.86892018, -1.24429494, -0.18702587],
       [-1.75552889, -0.33293992,  0.50697995]])

In [None]:
large_arr = np.random.randn(1000)
large_arr.sort()
large_arr[int(0.05 * len(large_arr))] # 5% quantile

## Unique and Other Set logic
#### NumPy has some basic set operations of 1-D arrays.
#### np.unique returns sorted unique values in an array.
#### np.in1d checks presence of values of one array in another.

In [131]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [132]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

array([1, 2, 3, 4])

In [133]:
values = np.array([6,0,0,2,3,5,6])
np.in1d(values, [2,3,6])

array([ True, False, False,  True,  True, False,  True])

## File Input and Output with Arrays
#### NumPy can save and load data to and from disk, in text or binary format.
#### Since most people will use pandas for text or tabular loading, we will use NumPy only in binary format.
#### np.save and np.load will save and load arrays to/from disks.
#### Arrays are by default saved in uncompressed raw binary format with extension '.npy'.
#### If file path does not have .npy extension, it will be appended.
#### array on disk can be loaded with np.load.

In [None]:
arr = np.arange(10)
np.save('some_array', arr)

In [None]:
np.load('some_array.npy')

#### To save multiple arrays in uncompressed archive, you can use np.savez. File extension is '.npz'.
#### Pass the arrays as arguments with label to the savez command.
#### When loading, NumPy will load the arrays in a dictionary like object, with the labels as keys and arrays as values.

In [None]:
np.savez('array_archive.npz', a=arr, b=arr)

In [None]:
arch = np.load('array_archive.npz')
arch['b']

#### If your data compresses well, use the np.savez_compressed command instead.

In [None]:
np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

## Linear Algebra
#### Using a * between two 2-D arrays leads to element-wise multiplication instead of dot product.
#### Hence, NumPy ha sthe function 'dot' for matrix multiplication. x.dot(y) is equivalent to np.dot(x,y).
#### Matrix product between 2-D and 1-D array of right size results in a 1-D array.
#### The '@' symbol (>=Python 3.5) also works as infix operator for matrix multiplication. 

The number of columns in the first matrix must be equal to the number of rows in the second matrix for the dot product to be defined. If A is an (m x n) matrix and B is an (n x p) matrix, the resulting matrix C from A.dot(B) will be an (m x p) matrix.

In [None]:
x = np.array([[1.,2.,3.],[4.,5.,6.]])    # Non principal diagonal elemnent must be same so matrix of Principal diag 
              
y = np.array([[6.,23.],[-1,7],[8,9]])
x.dot(y)

y.dot(x)----->wrong 

In [None]:
np.dot(x,y)

In [None]:
np.ones(3)

In [None]:
x

In [None]:
np.dot(x, np.ones(3)) #here one is 1D array so no matrix multipl 

In [None]:
x @ np.ones(3)

#### The module numpy.linalg has the set of functions for matrix decomposition and other functions.
#### They are implemented under the hood via same industry-standard libraries used in other languages like R and MATLAB. Eg - BLAS, LAPACK or Intel MKL.

In [None]:
from numpy.linalg import inv, qr

X = np.random.randn(5,5)
mat = X.T.dot(X)
inv(mat)

In [None]:
mat.dot(inv(mat))

In [None]:
q, r = qr(mat)

In [None]:
r

## Pseudorandom Number Generation
#### The module numpy.random provides functions for efficiently generating arrays of random values from many kinds of Probablity distributions.
#### Python's inbuilt random module only samples one value at a time. But NumPy is well over an order of magnitude faster for generating very large samples.

In [None]:
samples = np.random.normal(size=(4,4))
samples

In [None]:
from random import normalvariate

N = 1000000

%timeit samples = [normalvariate(0,1) for _ in range(N)]

In [None]:
%timeit np.random.normal(size=N)

#### The numbers generated are called 'pseudorandom numbers' because they are generated by an algorithm with deterministic behaviour based on the 'seed' of the random number generator.
#### The data generation by default uses a global random seed. To avoid it, use numpy.random.Randomstate to get isolated generators.

In [None]:
rng = np.random.RandomState(1234)
rng.randn(10)