

### Numpy
* Numpy means Numerical Python
* Python's Linear Algebra library
* Gives us multi dimensional (multi-dim) array

### Why do we need Numpy?
 * Memory requirement: Numpy lists require less memory than Python ones, see below example.
 * Operations on Numpy list are faster than that on normal lists.
 * Numpy is more convenient and has wider functionality.

In [1]:
# Memory requirement: Numpy lists require less memory than Python ones, see below example.

import numpy as np

li_arr = [i for i in range(0, 100)]
np_arr = np.arange(100)

print(li_arr)
print(np_arr)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]


In [2]:
## Size of one element in numpy array
print(np_arr.itemsize)

## Size of 100 elements in numpy array
print(np_arr.itemsize * np_arr.size)

8
800


In [3]:
import sys

a = 10

## Size req of 1 element in list
print(sys.getsizeof(a))

## Size of 100 elements in list
print(sys.getsizeof(1) * len(li_arr))

28
2800


In [4]:
import numpy as np
import time

size = 100000

## Addd elements of array a and b

def addition_using_list():
    t1 = time.time() # Epoch time
    a = range(size)
    b = np.arange(size)
    c = [a[i] + b[i] for i in range(size)]
    t2 = time.time()
    return (t2 - t1)

def addtion_using_numpy():
    t1 = time.time()
    a = np.arange(size)
    b = np.arange(size)
    c = a + b
    t2 = time.time()
    return (t2 - t1)

t_list = addition_using_list()
t_numpy = addtion_using_numpy()

print('List = ', t_list * 1000) ## Time in Milliseconds
print('Numpy = ', t_numpy * 1000) ## Time in Milliseconds

List =  122.03383445739746
Numpy =  1.5785694122314453


### Why are Numpy arrays fast?
In Numpy Array operations take place in chunks rather than element wise.

For example in case of adding respective elements of two lists, the addition takes place in chunks and not one element at a time.

**What is vectorization?**

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

Source: https://stackoverflow.com/questions/1422149/what-is-vectorization

### How to create Numpy arrays?

In [5]:
import numpy as np # np becomes alias for numpy
# Syntax for numpy array -> np.array(arraylike, .., )

a = [1, 2, 3]
b = np.array(a)

print(b)
print(type(b))

[1 2 3]
<class 'numpy.ndarray'>


In [6]:
a = [1, 2, 3, '5', 'a']
b = np.array(a) # Numpy array can contain only homogenous elements ie elements of same type

# Hence here each element converts to string

print(b)

['1' '2' '3' '5' 'a']


In [7]:
a = [1, 2, 3, '5']
b = np.array(a, dtype = int) # We can specify the data-type for the numpy array elements

print(b)

[1 2 3 5]


In [8]:
a = [1, 2, 3, '5']

b = np.array(a*3) # A numpy array with 3 copies

print(b)

['1' '2' '3' '5' '1' '2' '3' '5' '1' '2' '3' '5']


In [9]:
b = np.ones(3)
b

array([1., 1., 1.])

In [10]:
b = np.ones(3, dtype = int)
b

array([1, 1, 1])

In [11]:
b = np.zeros(3)
b

array([0., 0., 0.])

In [12]:
b = np.zeros(3, dtype = int)
b

array([0, 0, 0])

In [13]:
b = np.empty(2)
b

array([-5.73021895e-300,  6.90429962e-310])

### 2 D Array

In [14]:
b = np.ones((2, 4), dtype = int)
b

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [15]:
b = np.full(3, 5) # dtype be default integer
b

array([5, 5, 5])

In [16]:
b = np.full(3, 5, dtype = float)
b

array([5., 5., 5.])

In [17]:
b = np.full((3, 4), 5)
b

array([[5, 5, 5, 5],
       [5, 5, 5, 5],
       [5, 5, 5, 5]])

### numpy.arange

**numpy.arange([start, ]stop, [step, ]dtype=None)** 

Return evenly spaced values within a given interval. Values are generated within the half-open interval [start, stop).

For integer arguments the function is equivalent to the Python built-in range function, but returns an ndarray rather than a list. When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use numpy.linspace for these cases.

**Parameters**

***start : number, optional***

Start of interval. The interval includes this value. **The default start value is 0.**

***stop : number***

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

***step : number, optional***

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given.

***dtype : dtype***

The type of the output array. If dtype is not given, infer the data type from the other input arguments.

**Returns**

***arange : ndarray***

Array of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

In [18]:
b = np.arange(10)
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
b = np.arange(2, 10)
b

array([2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
b = np.arange(2, 20, 2)
b

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

### numpy.linspace

**numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)**

Return evenly spaced numbers over a specified interval.

Returns num evenly spaced samples, calculated over the interval [start, stop].

The endpoint of the interval can optionally be excluded.

**Parameters**

***start : array_like***

The starting value of the sequence.

***stop : array_like***

The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.

***num : int, optional***

Number of samples to generate. Default is 50. Must be non-negative.

***endpoint : bool, optional***

If True, stop is the last sample. Otherwise, it is not included. Default is True.

***retstep : bool, optional***

If True, return (samples, step), where step is the spacing between samples.

***dtype : dtype, optional***

The type of the output array. If dtype is not given, infer the data type from the other input arguments.

***axis : int, optional***

The axis in the result to store the samples. Relevant only if start or stop are array-like. By default (0), the samples will be along a new axis inserted at the beginning. Use -1 to get an axis at the end.


**Returns:**

***samples : ndarray***

There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).

***step : float, optional***

Only returned if retstep is True. Size of spacing between samples.

In [21]:
b = np.linspace(2, 10)
b

array([ 2.        ,  2.16326531,  2.32653061,  2.48979592,  2.65306122,
        2.81632653,  2.97959184,  3.14285714,  3.30612245,  3.46938776,
        3.63265306,  3.79591837,  3.95918367,  4.12244898,  4.28571429,
        4.44897959,  4.6122449 ,  4.7755102 ,  4.93877551,  5.10204082,
        5.26530612,  5.42857143,  5.59183673,  5.75510204,  5.91836735,
        6.08163265,  6.24489796,  6.40816327,  6.57142857,  6.73469388,
        6.89795918,  7.06122449,  7.2244898 ,  7.3877551 ,  7.55102041,
        7.71428571,  7.87755102,  8.04081633,  8.20408163,  8.36734694,
        8.53061224,  8.69387755,  8.85714286,  9.02040816,  9.18367347,
        9.34693878,  9.51020408,  9.67346939,  9.83673469, 10.        ])

In [22]:
print(b[2] - b[1]) # Checking step
print(b[4] - b[3])

0.16326530612244872
0.16326530612244872


In [23]:
b = np.linspace(2, 10, 5, dtype = int) # endpoint by default included
b

array([ 2,  4,  6,  8, 10])

In [24]:
b = np.linspace(2, 10, 5, dtype = int, endpoint = False)
b

array([2, 3, 5, 6, 8])

In [25]:
b = np.identity(3) # Only get square matrices (n*n)
b

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [26]:
b = np.eye(3, 4) # Can get n*m matrices
b

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [27]:
b = np.random.rand(3) # Created over range (0,1)
b

array([0.77432589, 0.2653666 , 0.83712345])

In [28]:
b = np.random.rand(3)
b

array([0.02287589, 0.47090189, 0.3055708 ])

In [29]:
b = np.random.rand(2,3)
b

array([[0.91660324, 0.7388416 , 0.80099334],
       [0.04480177, 0.45037727, 0.63534699]])

In [30]:
b = np.random.rand(10)*10
b

array([5.64148871, 1.15158223, 7.78282427, 4.2496276 , 3.40729633,
       4.29837868, 2.11136741, 4.67806762, 7.41160842, 9.97297107])

In [31]:
np.random.randint(6, size=10)

array([0, 0, 3, 3, 4, 2, 5, 2, 5, 3])

In [32]:
np.random.randint(6, size=(4,3))

array([[1, 1, 2],
       [1, 5, 3],
       [3, 3, 4],
       [0, 3, 1]])

### Indexing and Sliing in Numpy Array
* Numpy array is a collection of references which point to 4 different attributes.
    1. data => reference to first byte/element of the array
    2. shape => represents size of the array
    * dtype => represents dtype of elements present in array
    * strides => represent number bytes to be skipped to get to next element

In [33]:
li = [1, 2, 3, 4, 5]
arr = np.array(li)

print(li)
print(arr)

[1, 2, 3, 4, 5]
[1 2 3 4 5]


In [34]:
print(arr.data)
print(arr.shape)
print(arr.dtype)
print(arr.strides)

<memory at 0x7f18a10f0d08>
(5,)
int64
(8,)


In [35]:
# Accessing and slicing elements in 1D are similar in numpy arrays and lists
print(li[3])
print(arr[3])

print(li[1:4])
print(arr[1:4])

4
4
[2, 3, 4]
[2 3 4]


In [36]:
li_2d = [ [i+4*j+1 for i in range(4)] for j in range(4)]
li_2d

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]

In [37]:
arr_2d = np.array(li_2d)

print(arr_2d)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


In [38]:
print(arr_2d.data)
print(arr_2d.shape)
print(arr_2d.dtype)
print(arr_2d.strides) # (32, 8) 32 to skip to next row, 8 to skip to next element in a row

<memory at 0x7f18a1e9e630>
(4, 4)
int64
(32, 8)


In [39]:
# Accessing element in 2D array

print(li_2d[2][1])
print(arr_2d[2][1])

# Alternatively in Numpy we can also write as [a, b]

print(arr_2d[2, 1])

10
10
10


In [40]:
# Slicing elements from 2D array 

# Elements in a row

print(li_2d[1][:3])
print(arr_2d[1, :3])


# Elements column-wise

print("Wrong Answer", li_2d[0:4][2]) # Unable to produce desirable output

li_temp = [i[2] for i in li_2d]
print("Correct Answer", li_temp)# Correct way

# Numpy way

print("Numpy way", arr_2d[0:4, 2])

[5, 6, 7]
[5 6 7]
Wrong Answer [9, 10, 11, 12]
Correct Answer [3, 7, 11, 15]
Numpy way [ 3  7 11 15]


In [41]:
# More slicing with Numpy -> get 10, 11, 14, 15 from the array

print(arr_2d[2:4, 1:3])


[[10 11]
 [14 15]]


### Mathematical Operations - 1D

In [42]:
li = [1, 2, 3, 4, 5]
a = np.random.randint(1, 20, 5)
b = np.random.randint(1, 20, 5)

print(li)
print(a)
print(b)

[1, 2, 3, 4, 5]
[ 6 11 14  9 13]
[11  9  5  9  4]


In [43]:
# Adding 1 to every element

li = [i+1 for i in li]
li

[2, 3, 4, 5, 6]

In [44]:
# In Numpy

a = a + 1 # In case of Numpy these methods are vectorised
a

array([ 7, 12, 15, 10, 14])

In [45]:
# Element by element addition in Numpy

c = a + b
c

array([18, 21, 20, 19, 18])

In [46]:
d = a - b # Subtraction
d

array([-4,  3, 10,  1, 10])

In [47]:
e = a * b # Multiplication
e

array([ 77, 108,  75,  90,  56])

In [48]:
f = a / b # Division
f

array([0.63636364, 1.33333333, 3.        , 1.11111111, 3.5       ])

In [49]:
g = a // b # Floor Division
g

array([0, 1, 3, 1, 3])

In [50]:
h = a ** b # Exponenent
h

array([1977326743, 5159780352,     759375, 1000000000,      38416])

In [51]:
print(a)
print(a.sum()) # Sum
print(a.mean()) # Mean
print(a.min()) # min
print(a.argmin()) # index for min
print(a.max()) # max
print(a.argmax()) # index for max

[ 7 12 15 10 14]
58
11.6
7
0
15
2


In [52]:
# Relational and Logical operations

# Relational

print(a)
print(b)

[ 7 12 15 10 14]
[11  9  5  9  4]


In [53]:
a > b

array([False,  True,  True,  True,  True])

In [54]:
a < b

array([ True, False, False, False, False])

In [55]:
a == b

array([False, False, False, False, False])

In [56]:
# Logical Operations

print(np.logical_or(a, b))
print(np.logical_and(a, b))
print(np.logical_not(a))

[ True  True  True  True  True]
[ True  True  True  True  True]
[False False False False False]


### Boolean indexing in 1D Array

In [57]:
a = np.array(['t', 'b', 'r', 'd', 'e', 'f', 'z', 'a'])
b = np.random.randint(1, 20, 8)

print(a)
print(b)

['t' 'b' 'r' 'd' 'e' 'f' 'z' 'a']
[13  4  5 17 18  5 11 13]


In [58]:
print(b > 10)

[ True False False  True  True False  True  True]


In [59]:
bool_arr = b > 10

print(bool_arr)

[ True False False  True  True False  True  True]


In [60]:
new_arr = b[bool_arr]
print(new_arr)

[13 17 18 11 13]


In [61]:
# Alternatively

new_arr = b[b > 10]
print(new_arr)

[13 17 18 11 13]


In [62]:
# Elements greater than 10 and less than 18

new_arr = b[(b > 10) & (b<18) ]
new_arr

array([13, 17, 11, 13])

In [63]:
# Updating elements using the similar conditions as above

print(b)
c = b
c

[13  4  5 17 18  5 11 13]


array([13,  4,  5, 17, 18,  5, 11, 13])

In [64]:
c[:3] = 19
c

array([19, 19, 19, 17, 18,  5, 11, 13])

In [65]:
print(c)
c[c>15] = 100
c

[19 19 19 17 18  5 11 13]


array([100, 100, 100, 100, 100,   5,  11,  13])

#### Using np.where() to check where a condition is satisfying

In [66]:
print(b)

[100 100 100 100 100   5  11  13]


In [67]:
# Find indexes where value is 100

print(b[b == 100])

[100 100 100 100 100]


In [68]:
# to find indexes

ind = np.where(b == 100)
ind

(array([0, 1, 2, 3, 4]),)

In [69]:
print(a)
print(b)

['t' 'b' 'r' 'd' 'e' 'f' 'z' 'a']
[100 100 100 100 100   5  11  13]


In [70]:
# Applying np.where on multiple arrays

# if (b[i] == 100):
#    print a[i]

ind = np.where(b == 100)

print(ind)
print(a[ind])

(array([0, 1, 2, 3, 4]),)
['t' 'b' 'r' 'd' 'e']


In [71]:
type(ind) # numpy.where returns a tuple because each element of the tuple refers to a dimension.

# https://stackoverflow.com/questions/50646102/what-is-the-purpose-of-numpy-where-returning-a-tuple?noredirect=1&lq=1

tuple

### Boolean in 2D Array

In [72]:
a = np.random.randint(1, 30, (5, 6))
print(a)

[[15  1  2 28 14  2]
 [ 4 26 29 27 17  9]
 [ 1 12  9 14  5 10]
 [25  6 12 14 13  9]
 [18  3  3 14  3  5]]


In [73]:
a > 20

array([[False, False, False,  True, False, False],
       [False,  True,  True,  True, False, False],
       [False, False, False, False, False, False],
       [ True, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [74]:
bool_arr = a > 20
bool_arr

array([[False, False, False,  True, False, False],
       [False,  True,  True,  True, False, False],
       [False, False, False, False, False, False],
       [ True, False, False, False, False, False],
       [False, False, False, False, False, False]])

In [75]:
ans = a[bool_arr]
ans

array([28, 26, 29, 27, 25])

In [76]:
# if (a[i][j] > 100)
#    a[i][j] = 100

b = a
print(b)

[[15  1  2 28 14  2]
 [ 4 26 29 27 17  9]
 [ 1 12  9 14  5 10]
 [25  6 12 14 13  9]
 [18  3  3 14  3  5]]


In [77]:
b[bool_arr] = 100
b

array([[ 15,   1,   2, 100,  14,   2],
       [  4, 100, 100, 100,  17,   9],
       [  1,  12,   9,  14,   5,  10],
       [100,   6,  12,  14,  13,   9],
       [ 18,   3,   3,  14,   3,   5]])

In [78]:
# Working on specific column 

print(b)

[[ 15   1   2 100  14   2]
 [  4 100 100 100  17   9]
 [  1  12   9  14   5  10]
 [100   6  12  14  13   9]
 [ 18   3   3  14   3   5]]


In [79]:
# Update value for column 3, update it to 90

bool_arr = b[:, 3] == 100
print(bool_arr)

[ True  True False False False]


In [80]:
b[bool_arr, 3] = 99
print(b)

[[ 15   1   2  99  14   2]
 [  4 100 100  99  17   9]
 [  1  12   9  14   5  10]
 [100   6  12  14  13   9]
 [ 18   3   3  14   3   5]]


## Broadcasting

* The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. 

* Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. 

* **Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.** 

* There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

### Inorder to have compatible dimensions there are two rules
* Dimensions are equal (eg A.dim -> 3, 2 and B.dim -> 3, 2)
* One of them is one (eg A.dim -> 3, 3 and B.dim -> 3,)

Also,
 
 _Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible_

Eg:
* Image  (3d array): 256 x 256 x 3
  Scale  (1d array):             3
  Result (3d array): 256 x 256 x 3

In [81]:
x = np.random.randint(1, 10, (3, 3))
y = np.random.randint(1, 10, (3, 3))

print(x)
print(y)

[[2 7 8]
 [9 7 4]
 [2 5 3]]
[[9 3 6]
 [1 6 9]
 [4 2 7]]


In [82]:
ans = x - y
print(ans)

[[-7  4  2]
 [ 8  1 -5]
 [-2  3 -4]]


In [83]:
x = np.random.randint(1, 10, (3, 3))
y = np.random.randint(1, 10, (3))

print(x)
print(y)

[[8 1 6]
 [7 4 3]
 [2 4 8]]
[9 1 2]


In [84]:
ans = x - y
print(ans)

[[-1  0  4]
 [-2  3  1]
 [-7  3  6]]


In [85]:
x = np.random.randint(1, 10, (3, 2))
y = np.random.randint(1, 10, (2, 3))
y = np.transpose(y)
print(x)
print(y)

[[2 4]
 [5 4]
 [8 5]]
[[2 1]
 [8 4]
 [7 9]]


In [86]:
ans = x - y
ans

array([[ 0,  3],
       [-3,  0],
       [ 1, -4]])

In [87]:
x = np.random.randint(1, 10, (3, 2))
y = np.random.randint(1, 10, (2, 3))
y = np.transpose(y)
print(x)
print(y)

[[2 7]
 [1 1]
 [2 2]]
[[2 8]
 [2 8]
 [1 2]]


In [88]:
ans = x - y
ans

array([[ 0, -1],
       [-1, -7],
       [ 1,  0]])

In [89]:
# Resize and Reshape 

# Reshape does not alter data, Resize alters data


a = np.random.randint(5, 20, 10)
print(a)

a = a.reshape((5,2))
print(a)

[ 9 14 14 15 10 11 16 17 19  9]
[[ 9 14]
 [14 15]
 [10 11]
 [16 17]
 [19  9]]


In [90]:
# reshape

# If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.

a = np.random.randint(5, 20, 10)

print(a)

a = np.resize(a,(4, 3))
a

[18 18 18 19 18 15 17 18  8 11]


array([[18, 18, 18],
       [19, 18, 15],
       [17, 18,  8],
       [11, 18, 18]])