# NumPy

NumPy is a python library used for working with arrays.

It also has functions for working in domain of linear algebra, fourier transform, and matrices.

In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster that traditional Python lists.

The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

In [1]:
import numpy as np
from numpy import random 

In [3]:
a = np.array([1,2,3,4,5])
a

array([1, 2, 3, 4, 5])

In [4]:
print(np.__version__)

1.16.5


1D , 2D , 3D arrays

In [5]:
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a)
print(b)
print(c)
print(d)
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

42
[1 2 3 4 5]
[[1 2 3]
 [4 5 6]]
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
0
1
2
3


In [6]:
print('c(2,2): ',c[1][1] )

c(2,2):  5


#SLICING ARRAYS

Slicing in python means taking elements from one given index to another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

Use the minus operator to refer to an index from the end

In [7]:
print(b[1:4])
print(b[1:])
print(b[:4])
print(b[1:4:2])
print(b[-3:-1])

[2 3 4]
[2 3 4 5]
[1 2 3 4]
[2 4]
[3 4]


FOR 2D ARRAYS

For a slicing in 2D: [a,b]

a - array to be choosen , could be a single number or range

b - elements to be choosen , could be a single number or range

In [28]:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10],[11,12,13,14,15]])
print(arr[1, 1:4])
print(arr[0:2, 2])
print(arr[0:2, 1:4])
print(arr[0:,1:4])

[7 8 9]
[3 8]
[[2 3 4]
 [7 8 9]]
[[ 2  3  4]
 [ 7  8  9]
 [12 13 14]]


Data types in NumPy

i - integer

b - boolean

u - unsigned integer

f - float

c - complex float

m - timedelta

M - datetime

O - object

S - string

U - unicode string

V - fixed chunk of memory for other type ( void )

In [9]:
arr = np.array([1, 2, 3, 4])

print(arr)
print(arr.dtype)

[1 2 3 4]
int32


In [10]:
arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']
|S1


We can convert the data type of an array

In [11]:
arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype('i')

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


In [12]:
arr = np.array([1, 0, 3])

newarr = arr.astype(bool)

print(newarr)
print(newarr.dtype)

[ True False  True]
bool


The Difference Between Copy and View

The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.

The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

In [13]:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
y = arr.view()
arr[0] = 42

print(arr)
print(x)
print(y)

arr[1] = 45
x[0] = 31

print(arr)
print(x)
print(y)

[42  2  3  4  5]
[1 2 3 4 5]
[42  2  3  4  5]
[42 45  3  4  5]
[31  2  3  4  5]
[42 45  3  4  5]


In [14]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

(2, 4)


RESHAPE

For 1D to 2D: The outermost dimension will have 4 arrays, each with 3 elements

For 1D to 3D: The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements

-1 is used for an unknown element and python calculates it for us(We can not pass -1 to more than one dimension)

Flattening array means converting a multidimensional array into a 1D array

There are a lot of functions for changing the shapes of arrays in numpy 'flatten', 'ravel' and also for rearranging the elements 'rot90', 'flip', 'fliplr', 'flipud'

In [31]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)
newarrr = arr.reshape(2, 3, 2)
nayarr = arr.reshape(3,-1,4)

print(newarr)
print(newarrr)
print(nayarr)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
[[[ 1  2  3  4]]

 [[ 5  6  7  8]]

 [[ 9 10 11 12]]]


In [34]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

print(arr.reshape(2, 4))

[[1 2 3 4]
 [5 6 7 8]]


In [17]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(2, 2, -1)

print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [37]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4 5 6]


In [21]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
    for y in x:
        for z in y:
            print(z)

1
2
3
4
5
6
7
8
9
10
11
12


The function nditer() is a helping function that can be used from very basic to very advanced iterations. It solves some basic issues which we face in iteration, lets go through it with examples

In [22]:
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
    print(x)

1
2
3
4
5
6
7
8


We can use op_dtypes argument and pass it the expected datatype to change the datatype of elements while iterating.

NumPy does not change the data type of the element in-place (where the element is in array) so it needs some other space to perform this action, that extra space is called buffer, and in order to enable it in nditer() we pass flags=['buffered']

In [38]:
arr = np.array([1, 2, 3])

for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
    print(x,x.dtype)

b'1' |S11
b'2' |S11
b'3' |S11


In [40]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for x in np.nditer(arr[0:, ::2]):     #0: for rows and ::2 for slicing in a row
    print(x)

1
3
5
7


In [3]:
arr = np.array([1, 2, 3],[4,5,6])

for index, x in np.ndenumerate(arr):    #enumerate gives the index and value
    print(index, x)

TypeError: data type not understood

In [26]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8


JOINING ARRAYS

Stacking is same as concatenation, the only difference is that stacking is done along a new axis.

We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, ie. stacking.

We pass a sequence of arrays that we want to join to the concatenate() method along with the axis. If axis is not explicitly passed it is taken as 0

NumPy also has vstack() , hstack() for vertical and horizontal stacking without the axis argument and dstack() for stacking by depth

In [42]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [49]:
arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=0)     #joining along columns 

arrr = np.concatenate((arr1, arr2), axis=1)    #joining along rows
print(arr)
print(arrr)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[1 2 5 6]
 [3 4 7 8]]


In [53]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr)

[[1 4]
 [2 5]
 [3 6]]


In [54]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.dstack((arr1, arr2))

print(arr)

[[[1 4]
  [2 5]
  [3 6]]]


SPLITTING ARRAYS

We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits

hsplit() , vsplit() and dsplit() are just completely opposites of stacking

In [58]:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)
newarr1 = np.array_split(arr, 4)

print(newarr)
print(newarr1)

[array([1, 2]), array([3, 4]), array([5, 6])]
[array([1, 2]), array([3, 4]), array([5]), array([6])]


In [66]:
arr = np.array([[1,2] , [2,3] , [3,4] , [1,2] , [2,3] , [3,4]])

newarr = np.array_split(arr,3)

newarr1 = np.array_split(arr , 2 , axis = 1)

print(newarr[1])
print(newarr1)

[[3 4]
 [1 2]]
[array([[1],
       [2],
       [3],
       [1],
       [2],
       [3]]), array([[2],
       [3],
       [4],
       [2],
       [3],
       [4]])]


In [68]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.vsplit(arr, 3)

print(newarr)

[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]


SEARCH

In [70]:
arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)
y = np.where(arr%2 == 0)       #returns the indexes of the numbers we want and not the values
print(x)
print(y)

(array([3, 5, 6], dtype=int64),)
(array([1, 3, 5, 6], dtype=int64),)


In [71]:
arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7)

print(x)

1


In [73]:
arr = np.array([1, 3, 5, 7])

x = np.searchsorted(arr, [2, 4, 6])      #returns where should we insert these values so that the order of array remains same

print(x)

[1 2 3]


In [74]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

[0 1 2 3]


ARRAY FILTER

In [75]:
arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]     #The new filter contains only the values where the filter array had the value True
print(newarr)        # in this case, index 0 and 2

[41 43]


In [76]:
arr = np.array([41, 42, 43, 44])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
  # if the element is higher than 42, set the value to True, otherwise False:
    if element > 42:
        filter_arr.append(True)
    else:
        filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]
[43 44]


In [79]:
#Array filter for even numbers
arr = np.array([1,2,3,4,5,6,7,8])

filter_array = []

for element in arr:
    if element%2 ==0:
        filter_array.append(True)
    else:
        filter_array.append(False)

newarr = arr[filter_array] 

print(filter_array)
print(newarr)

[False, True, False, True, False, True, False, True]
[2 4 6 8]


In [82]:
arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42
filter_array = arr%2 ==0

newarr = arr[filter_arr]
newarr1 = arr[filter_array]

print(filter_arr)
print(newarr)
print(newarr1)

[False False  True  True]
[43 44]
[42 44]


RANDOM

The choice() method allows you to generate a random value based on an array of values.

The choice() method takes an array as a parameter and randomly returns one of the values.

In [86]:
a = random.randint(100)
b = random.rand()                         #Random float between 0 and 1
c = random.randint(100, size=(5))
d = random.randint(100, size=(3, 5))


print(a)
print(b)
print(c)
print(d)

80
0.33072355139182297
[46 19 54 22 11]
[[51 94 52 19 93]
 [72  4 94 99  0]
 [35 40 33 27 24]]


In [89]:
x = random.rand(3)
y = random.rand(3, 5)

print(x)
print(y)

[0.83089442 0.7981464  0.85985122]
[[0.6076549  0.08837713 0.26924859 0.39532409 0.73623756]
 [0.88023199 0.28537106 0.61801587 0.6081591  0.44915115]
 [0.00646866 0.96834649 0.99257555 0.99740745 0.39915131]]


In [90]:
x = random.choice([3, 5, 7, 9])
y = random.choice([3, 5, 7, 9], size=(3, 5))
print(x)
print(y)

9
[[9 9 7 7 3]
 [5 7 9 3 7]
 [9 5 3 7 5]]


In [91]:
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = np.add(x, y)

print(z)

[ 5  7  9 11]


The frompyfunc() method takes the following arguments:

function - the name of the function.

inputs - the number of input arguments (arrays).

outputs - the number of output arrays.

In [92]:
def myadd(x, y):
    return x+y

myadd = np.frompyfunc(myadd, 2, 1)

print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

[6 8 10 12]
