### NumPy - Numerical Python
#### KG
* Fundamental package to scientific computations in python
* Basis for other major data science libraries such as Pandas
* Multi-dimensional library
    * can store all sorts of data - 1D, 2D, 3D

* NumPy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.


Uses of NumPy over Lists:
* Lists are very slow where as NumPy arrays are very fast
* NumPy uses fixed types 
    Ex: 5 --> 00000101 (1 byte)
    with numpy, 1 byte 5 is automatically casted into integer type (Int32) which is 4 bytes
    so it represents 5 in total memory space of 4 bytes 
    -- 00000000 00000000 00000000 00000101
    
    Can also specify in numpy that may be we don't want to specify in 32 but 16 are enough
    So Int16 --> 00000000 00000101
    Or even small values with Int8 --> 00000101
    
    
    With Lists, there is a lot more information that we need to store as an integer
    
    List uses a built-in int type for python which consists of object values (has its own bits associated), object type, reference count (how many times that that specific integer is pointed at) and size of that integer value
    
    So in actual binary -- actual value is represented as long (8 bytes), obj type (long - 8 bytes), 
    reference count (long - 8 bytes), object size (may be only 4 bytes).
    
    So the above is only for a single integer in lists - needs lot more space than what numpy needs
    
* Because numpy uses less bytes of memory - lot faster
* With numpy, when iterating over values in a np array- we don't have to do a type check every time
* In lists, we can have elements of different types (int, floats etc..) so need to check the type of each elem 
* Numpy --> utilizes contiguous memory
    Imagine a block of memory 
    Lists --> would be scattered around -- memory blocks are not necessarily next to each other
              If you have an array of pointers pointing to different elements (actual ifno) of a list, they would be scattered around everywhere -- so not efficient especially when doing slicing, subsets, performing functions etcc. Very slow
              
              
    Numpy --> uses continuous block. Somehow needs to store the starting point and total size of the block etcc..
       Benefits: Computers have SIMD Vector Processing units which we can utilize if the elements or memory units are all next to one another
       
       SIMD - single instruction multiple data
       Ex: If we have to do addition of values, instead of doing one addition at one time, we can just do the computations on all at one time if they are next to each other.
       
       So can also more effectively utilize cache (Effective Cache Utilization) --> if we load in all values, we can keep them close to where we need to access them and perform operations.
       Whereas in case of a list, we may load half of them and the other half might be scattered at different places we might have to go back and load them again into cache --> takes more time and more processing
       
       

Lists vs NumPy

Lists --> insertion, deletion, appending, concatening etcc....
NumPy --> all the above and much more

Big diff:
       

In [3]:
a = [1,3,5]
b = [1,2,3]

a * b

TypeError: can't multiply sequence by non-int of type 'list'

In [5]:
## Can do the above in comprehensions
a_b = [x*y for x in a for y in b]
a_b

[1, 2, 3, 3, 6, 9, 5, 10, 15]

In [6]:
## But can do it better in numpy more efficiently

### Applications
* Powerful for mathematics (MATLAB replacement) - scipy might be better but np still powerful
* Plotting 
* Backend (Core for Pandas)
* Can store images
* Machine Learning - Tensor library pretty similar to numpy

In [8]:
import numpy as np
dir(np)

['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__future_scalars__',
 '__getattr__',
 '__git_version__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',

### Initializing an array

In [30]:
# 1-D
ex_arr1 = np.array([1,2,3])
ex_arr1

array([1, 2, 3])

In [13]:
# 2-D
ex_arr2 = np.array([[1.0,2.0,3.0], [4.0,5.0,6.0]])
ex_arr2

array([[1., 2., 3.],
       [4., 5., 6.]])

In [14]:
# 3-D
ex_arr3 = np.array([[1.0,2.0,3.0], [4.0,5.0,6.0], [7.0,8.0,9.0]])
ex_arr3

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

### Get dimensions of np array

In [15]:
ex_arr1.ndim

1

In [16]:
ex_arr2.ndim

2

In [17]:
ex_arr3.shape

(3, 3)

In [18]:
ex_arr2.shape

(2, 3)

In [19]:
ex_arr1.shape

(3,)

### Get the type of np array elements
* Would be helpful to see how much memory we are taking up

In [21]:
ex_arr1.dtype

dtype('int64')

In [22]:
## Can specify specific types
ex_arr1 = np.array([1,2,3], dtype='int32')
ex_arr1.dtype

dtype('int32')

In [24]:
# Can even say int16 so that takes up less memory
ex_arr1 = np.array([1,2,3], dtype='int16')
ex_arr1.dtype

dtype('int16')

### Get the element size
* itemsize --> gives the size of each array item

In [25]:
ex_arr1.itemsize

2

In [26]:
ex_arr2.itemsize

8

* nbytes --> gives the total bytes used for array elements

In [29]:
# total size with all the elements
ex_arr1.nbytes

6

### Accessing/Changing Specific Elements, Rows, Columns

In [32]:
ex_arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
ex_arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [33]:
# Get a specific element [r,c]
ex_arr[1,2]

8

In [34]:
# Can also use -ve notation similar to lists
ex_arr[-1, -2]

9

In [35]:
# Get a specific row
ex_arr[0, :]

array([1, 2, 3, 4, 5])

In [36]:
# Get a specific column
ex_arr[:, 3]

array([4, 9])

In [38]:
# Getting elements with stepsize
ex_arr[0, 1:4:2]

array([2, 4])

In [40]:
# Can also do with -ve index
ex_arr[0, 1:-1:2]

array([2, 4])

In [42]:
# Change an element value
print(ex_arr[0, 2])

ex_arr[0, 2] = 13

print(ex_arr[0, 2])

3
13


In [43]:
# Can do the same for series of elements
ex_arr[:, 2] = 55
ex_arr

array([[ 1,  2, 55,  4,  5],
       [ 6,  7, 55,  9, 10]])

In [44]:
# If you want to have different numbers, kind of specify same shape as subsequenced
ex_arr[:, 2] = [50, 550]
ex_arr

array([[  1,   2,  50,   4,   5],
       [  6,   7, 550,   9,  10]])

In [45]:
## 3D Example

arr1 = np.array([[[1,2], [3,4]],[[5,6], [7,8]]])
arr1

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [46]:
# Get specific element - work outside in

arr1[0,1,1]

4

In [47]:
arr1[:,1,:]

array([[3, 4],
       [7, 8]])

In [48]:
# Replace
arr1[:,1,:] = [[11,11], [22,22]]
arr1

array([[[ 1,  2],
        [11, 11]],

       [[ 5,  6],
        [22, 22]]])

### Initializing Different Types of Arrays

#### To initialize with zeros

In [49]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [50]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [51]:
np.zeros((2,3,3))

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

In [52]:
np.zeros((2,3,3,2))

array([[[[0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.]]],


       [[[0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.]]]])

#### To initialize with all 1's

In [53]:
np.ones((4,2,2), dtype='int16')

array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]], dtype=int16)

#### With any other number

In [54]:
np.full((2,2), 77)

array([[77, 77],
       [77, 77]])

In [55]:
np.full((2,2), 77, dtype='float32')

array([[77., 77.],
       [77., 77.]], dtype=float32)

#### NOT NEEDED - With any other number - full_like - to reuse array that is already built
* Allows us to take a shape that is already built - can pass the array of same size and fill with required value

In [56]:
np.full_like(ex_arr.shape, 4)

array([4, 4])

In [57]:
np.full_like(ex_arr, 4)

array([[4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4]])

#### Initialize a matrix of random numbers

In [58]:
np.random.rand(4,2)

array([[0.65295455, 0.79995466],
       [0.90760464, 0.83192036],
       [0.05169649, 0.91742597],
       [0.7482415 , 0.59401975]])

In [59]:
np.random.rand(4,2,3)

array([[[0.39064491, 0.23011706, 0.58154943],
        [0.57483991, 0.93295594, 0.60584048]],

       [[0.53599605, 0.0851813 , 0.40343787],
        [0.46372972, 0.8711919 , 0.50839873]],

       [[0.14003404, 0.09722026, 0.26786442],
        [0.97624371, 0.23464637, 0.8756309 ]],

       [[0.34518989, 0.5125818 , 0.62305656],
        [0.00970061, 0.20224372, 0.76763015]]])

In [60]:
## NOT NEEDED - to pass in a sample
np.random.random_sample(ex_arr.shape)

array([[0.53901229, 0.84000494, 0.06964747, 0.8661529 , 0.25036995],
       [0.07387293, 0.52499566, 0.43825524, 0.84521926, 0.02916312]])

#### To initialize with random integer values

In [62]:
np.random.randint(7, size=(3,3))

array([[0, 6, 3],
       [2, 6, 6],
       [1, 2, 3]])

In [63]:
np.random.randint(4,7, size=(3,3))

array([[6, 6, 5],
       [4, 5, 6],
       [5, 4, 5]])

In [64]:
np.random.randint(-4,8, size=(3,3))

array([[-3,  1,  0],
       [ 4,  3,  2],
       [ 5,  3,  6]])

#### Identity matrix - needs only one parameter coz by its nature is going to be a square matrix

In [65]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

#### NOT NEEDED - Repeat an array

In [67]:
arr = np.array([1,2,3])
rep_arr = np.repeat(arr, 3)

rep_arr

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [69]:
arr = np.array([[1,2,3]])
rep_arr = np.repeat(arr, 3, axis=0)

rep_arr

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [70]:
arr = np.array([[1,2,3]])
rep_arr = np.repeat(arr, 3, axis=1)

rep_arr

array([[1, 1, 1, 2, 2, 2, 3, 3, 3]])

* 1 1 1 1 1
* 1 0 0 0 1
* 1 0 9 0 1
* 1 0 0 0 1
* 1 1 1 1 1

In [77]:
test_arr = np.ones((5,5), dtype='int16')
test_arr

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]], dtype=int16)

In [78]:
test_arr[1:4,1:4]

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int16)

In [79]:
test_arr[1:4,1:4] = np.repeat(np.array([[0,0,0]]), 3, axis=0)

test_arr

array([[1, 1, 1, 1, 1],
       [1, 0, 0, 0, 1],
       [1, 0, 0, 0, 1],
       [1, 0, 0, 0, 1],
       [1, 1, 1, 1, 1]], dtype=int16)

In [80]:
test_arr[2,2] = 9
test_arr

array([[1, 1, 1, 1, 1],
       [1, 0, 0, 0, 1],
       [1, 0, 9, 0, 1],
       [1, 0, 0, 0, 1],
       [1, 1, 1, 1, 1]], dtype=int16)

### Be careful when copying arrays

In [81]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [82]:
b = a
b

array([1, 2, 3])

In [83]:
b[0] = 100
print(a)
print(b)

[100   2   3]
[100   2   3]


In [85]:
a = [1,2,3]
b = a.copy()
b[0] = 200
print(a)
print(b)

[1, 2, 3]
[200, 2, 3]


### Mathematics

In [86]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [91]:
# Element wise addition, subtraction
a + 2

array([3, 4, 5])

In [94]:
a += 2
a

array([7, 8, 9])

In [88]:
a - 2

array([-1,  0,  1])

In [89]:
a * 2

array([2, 4, 6])

In [90]:
a / 2

array([0.5, 1. , 1.5])

In [96]:
b = np.array([1,0,1])
a + b

array([ 8,  8, 10])

In [97]:
a ** 2

array([49, 64, 81])

In [98]:
# Take the sin of all values
np.cos(a)

array([ 0.75390225, -0.14550003, -0.91113026])

### Linear Algebra
* With LA, we are not doing element-wise

In [102]:
a = np.full((2,3), 1)
a

array([[1, 1, 1],
       [1, 1, 1]])

In [103]:
b = np.full((3,2), 2)
b

array([[2, 2],
       [2, 2],
       [2, 2]])

In [104]:
np.matmul(a,b)

array([[6, 6],
       [6, 6]])

### NOT NEEDED

In [105]:
c = np.identity(3)
c

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [106]:
# Find the determinant; can do eigen values, singular vector decomp, inverse etc..
np.linalg.det(c)

1.0

### Statistics

In [107]:
stats = np.array([[1,2,3], [4,5,6]])
stats

array([[1, 2, 3],
       [4, 5, 6]])

In [110]:
np.min(stats, axis=1)

array([1, 4])

In [111]:
np.min(stats, axis=0)

array([1, 2, 3])

In [109]:
np.max(stats)

6

In [112]:
np.sum(stats)

21

In [113]:
np.sum(stats, axis=0)

array([5, 7, 9])

### Reorganizing Arrays

In [114]:
ini_arr = np.array([[1,2,3,4], [5,6,7,8]])
print(ini_arr)

[[1 2 3 4]
 [5 6 7 8]]


In [115]:
reshape_arr  = ini_arr.reshape(8,1)
reshape_arr

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

In [116]:
reshape_arr  = ini_arr.reshape(2,2,2)
reshape_arr

# Works as long as using the exact number of elements available

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

### Vertically stacking arrays

In [118]:
v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])

In [119]:
# v1 and v2 are separate arrays but we can stack one on top of another

np.vstack([v1,v2])

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [120]:
# can keep stacking
np.vstack([v1,v2,v2,v2])

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [5, 6, 7, 8],
       [5, 6, 7, 8]])

### Horizontally stacking arrays

In [122]:
h1 = np.ones((2,4))
h2 = np.zeros((2,2))

print(h1)
print(h2)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[0. 0.]
 [0. 0.]]


In [124]:
np.hstack((h1, h2))

array([[1., 1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0., 0.]])

### Loading data from file

In [128]:
test_data = np.genfromtxt('np_data', delimiter=',')
test_data # automatically casted to float

array([[  1.,  13.,  21.,  11., 196.,  75.],
       [  3.,  42.,  12.,  33., 766.,  75.],
       [  1.,  22.,  33.,  11., 999.,  11.]])

In [129]:
test_data = test_data.astype('int32') # doesnt do in place coz of different sizes
test_data
# genfromtxt will handle new breaks properly

array([[  1,  13,  21,  11, 196,  75],
       [  3,  42,  12,  33, 766,  75],
       [  1,  22,  33,  11, 999,  11]], dtype=int32)

### Boolean Masking & Advanced Indexing

In [131]:
# where in data the value is > 50
test_data > 50

array([[False, False, False, False,  True,  True],
       [False, False, False, False,  True,  True],
       [False, False, False, False,  True, False]])

In [132]:
# indexing where data is > 50
test_data[test_data > 50] 

array([196,  75, 766,  75, 999], dtype=int32)

In [133]:
# can index with a list in numpy
a = np.array([1,2,3,4,5,6,7,8,9])
a[[1,2,8]]

array([2, 3, 9])

In [141]:
## For each column, even if one value is > 50 --> return True
np.any(test_data > 50, axis = 0)

array([False, False, False, False,  True,  True])

In [142]:
## For each column, if all values are > 50 --> return False
np.all(test_data > 50, axis = 0)

array([False, False, False, False,  True, False])

In [138]:
## Which row has all values > 50
np.all(test_data > 50, axis = 1)

array([False, False, False])

In [139]:
# > 50 & < 100
((test_data > 50) & (test_data < 100))

array([[False, False, False, False, False,  True],
       [False, False, False, False, False,  True],
       [False, False, False, False, False, False]])

In [140]:
# not (> 50 & < 100)
(~(test_data > 50) & (test_data < 100))

array([[ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True, False, False],
       [ True,  True,  True,  True, False,  True]])