# NumPy

### NumPy vs Lists: Majorly Performance Benefits
#### NumPy uses less bytes of memory and easy to retrieve, so it is faster from performance standpoint
#### With NumPy you don't have to do any type checking when iterating through objects
#### NumPy is stored in Contiguous memory (Within computer there is something called SIMD(Single Instruction multiple data) Vector processing units, effective Cache Utilization etc
   ##### For Ex: Int8 - storing 8 bytes of data together in NumPy vs storing in different locations in Lists

##### With Lists & NumPy you can do Insertion, deletion, appending, concatenation etc, but the big difference is within NumPy we can do lot more. For Ex: item wise computations when multiplying values between 2 lists.

#### Applications of NumPy: 
##### Lot of Mathematical operations (MATLAB Replacement), Plotting (Matplotlib), Backend (Pandas, Connect 4, Digital Photography), Machine Learning applications etc.

In [1]:
import numpy as np

## The Basics (creating arrays, shape, size, data type)

In [2]:
a = np.array([1,2,3])
print(a)

[1 2 3]


In [4]:
b = np.array([[1.2,2,3.3],[1,2,5]])
print(b)

[[1.2 2.  3.3]
 [1.  2.  5. ]]


In [9]:
# Get Dimension - Gives no. of dimensions
a.ndim

1

In [10]:
b.ndim

2

In [7]:
#Get Shape - Gives you rows and columns, size etc.
a.shape

(3,)

In [8]:
b.shape

(2, 3)

In [11]:
#Get type and size
a.dtype

dtype('int32')

In [18]:
# specify size so it will not take whole 32 bits (4 bytes: i byte = 8 bits).
a = np.array([1,2,3], dtype='int16')
print(a)
a.dtype

[1 2 3]


dtype('int16')

In [26]:
#Get size
a.itemsize #will give number of bytes

2

In [24]:
#Get total size (size specifies number of elements & itemsize for number of bytes)
a.size * a.itemsize

6

In [23]:
a.nbytes  #you can also Get total size using nbytes

6

In [27]:
b.itemsize

8

## Accessing/Changing Specific Elements, Rows, Columns, etc (slicing)

In [29]:
a = np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print(a)

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]


In [30]:
a.size

14

In [31]:
a.shape

(2, 7)

In [34]:
# Get a specific element [row index, column index]. In python, indexing starts with zero.
a[1,5]

13

In [36]:
a[1,-5]

10

In [37]:
# Get a specific row
a[0,:]

array([1, 2, 3, 4, 5, 6, 7])

In [38]:
# get a specific column 
a[:,2]

array([ 3, 10])

In [42]:
# Getting little more fancy[startindex:endindex:stepsize]
a[0, 1:6:2]

array([2, 4, 6])

In [44]:
#To update a specific value in array
a[1,5]=20 # changing the value of 13 to 20 in the 2nd row
print(a)

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 20 14]]


In [46]:
#To replace a specific column with same values in a array
a[:,2] = 5  #3rd column will now change to 5 value across all the rows
print(a)

[[ 1  2  5  4  5  6  7]
 [ 8  9  5 11 12 20 14]]


#### 3-d example

In [48]:
b = np.array([[[1,2],[3,4],[5,6],[7,8]]])
print(b)

[[[1 2]
  [3 4]
  [5 6]
  [7 8]]]


In [49]:
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(b)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [51]:
# get specific elements (work outside in)
b[0,1,1] # get value 4

4

In [52]:
# to extract 1st row
b[0:1:1]

array([[[1, 2],
        [3, 4]]])

In [55]:
# To extract 2nd column
b[:,0,:] #  To extract 1st column
b[:,1,:] #  To extract 2nd column

array([[3, 4],
       [7, 8]])

In [57]:
# To replace 2nd column
b[:,1,:] = [[9,9],[8,8]]   #  To extract 2nd column
b

array([[[1, 2],
        [9, 9]],

       [[5, 6],
        [8, 8]]])

##  Initializing Different Arrays (1s, 0s, full, random, etc...)

In [58]:
# All 0s matrix
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [59]:
np.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [60]:
np.zeros([2,3,2])

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [61]:
np.zeros([2,3])

array([[0., 0., 0.],
       [0., 0., 0.]])

In [64]:
# All 1s matrix
np.ones((2,2))

array([[1., 1.],
       [1., 1.]])

In [65]:
np.ones((4,2,2))

array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]])

In [66]:
# We can also specify data type with all 1st but integer values
np.ones((4,2,2), dtype='int32')

array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]])

In [67]:
#Any other number, for example value 99
np.full((2,2),99)

array([[99, 99],
       [99, 99]])

In [68]:
#Any other number with float dtype: example value 99
np.full((2,2),99, dtype='float32')

array([[99., 99.],
       [99., 99.]], dtype=float32)

In [70]:
# Any other number (full_like)
np.full_like(a,4) # a variable has 1 row 2 column with different values, now want to show '4' as values in all the rows and columns

array([[4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4]])

In [72]:
# Random decimal numbers
np.random.rand(4,2)

array([[0.39998533, 0.49272598],
       [0.49505368, 0.71289163],
       [0.73296365, 0.84339796],
       [0.21451869, 0.45198135]])

In [97]:
# Random integer values
np.random.randint(4,7,size=(3,3)) #print values between 4 and 7 with 3*3 dimension

array([[4, 4, 4],
       [6, 5, 5],
       [4, 4, 4]])

In [103]:
# The identity matrix
x = np.identity(3)
print(x)

# To print int values
np.identity(3, dtype='int16')

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int16)

In [104]:
# To repeat an array few times
arr = np.array([1,2,3]) # One dimensional array
r1 = np.repeat(arr,3,axis=0) # Specify array, no. of times, axis value
print(r1)

[1 1 1 2 2 2 3 3 3]


In [111]:
arr = np.array([[1,2,3]]) # Two dimensional array
r1 = np.repeat(arr,3,axis=0)
print(r1)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In [118]:
output = np.ones((5,5))
print(output)

z=np.zeros((3,3))
z[1,1]=9
print(z)

output[1:4,1:4]=z   # output[1:-1,1:-1]=z #This line of code can be written as 1st element to the last element
print(output)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[0. 0. 0.]
 [0. 9. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 9. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]


#### Be careful when copying arrays!!!

In [127]:
a = np.array([1,2,3])
print(a)
b = a
print(b)
b[0]=100

print(b)
print(a)
# As we can see we have both a and b variables have similar values although you changed the values only in b. 
# in order to overcome this issue, you can use copy function.

c = np.array([4,5,6])
print(c)
d = c.copy()
print(d)
d[0]=120

print(d)
print(c)

[1 2 3]
[1 2 3]
[100   2   3]
[100   2   3]
[4 5 6]
[4 5 6]
[120   5   6]
[4 5 6]


## Basic Mathematics (arithmetic, trigonometry, etc.)

In [128]:
a = np.array([1,2,3,4,5])
print(a)

[1 2 3 4 5]


In [130]:
a + 2

array([3, 4, 5, 6, 7])

In [131]:
a - 2

array([-1,  0,  1,  2,  3])

In [132]:
a * 2

array([ 2,  4,  6,  8, 10])

In [133]:
a / 2

array([0.5, 1. , 1.5, 2. , 2.5])

In [135]:
a+=2
print(a)

[5 6 7 8 9]


In [141]:
b = np.array([1,0,1,0,0])
print(a+b)
print(a*b)

[6 6 8 8 9]
[5 0 7 0 0]


In [139]:
a**2

array([25, 36, 49, 64, 81], dtype=int32)

#### Linear Algebra

In [145]:
a = np.ones((2,3))
print(a)

b=np.full((3,2),2)
print(b)

np.matmul(a,b)

[[1. 1. 1.]
 [1. 1. 1.]]
[[2 2]
 [2 2]
 [2 2]]


array([[6., 6.],
       [6., 6.]])

In [147]:
# Find the determinant
c = np.identity(3)
np.linalg.det(c)

1.0

### Statistics

In [148]:
stats = np.array([[1,2,3],[4,5,6]])
stats

array([[1, 2, 3],
       [4, 5, 6]])

In [150]:
np.min(stats)   

1

In [152]:
np.max(stats)
      

6

In [157]:
np.max(stats, axis=1)

array([3, 6])

In [155]:
np.sum(stats, axis=0) #Summation of the column values in the array form

array([5, 7, 9])

### Reorganizing Arrays

In [162]:
before = np.array([[1,2,3,4],[5,6,7,8]])
print(before)
print(before.shape)

after = before.reshape((8,1))
print(after)

after = before.reshape((4,2))
print(after)

[[1 2 3 4]
 [5 6 7 8]]
(2, 4)
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [164]:
# Vertically stacking vectors/matrices
v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])
print(np.vstack([v1,v2]))
print(np.vstack([v1,v2,v1,v2]))


[[1 2 3 4]
 [5 6 7 8]]
[[1 2 3 4]
 [5 6 7 8]
 [1 2 3 4]
 [5 6 7 8]]


In [166]:
# Horizontal stacking vectors/matrices
h1 = np.ones((2,4))
h2 = np.zeros((2,2))
print(np.hstack([h1,h2]))
print(np.hstack([h1,h2,h1]))


[[1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 1. 0. 0.]]
[[1. 1. 1. 1. 0. 0. 1. 1. 1. 1.]
 [1. 1. 1. 1. 0. 0. 1. 1. 1. 1.]]


### Miscellaneous

####  Load data from a file

In [169]:
filedata = np.genfromtxt('data.txt',delimiter=',')
filedata

array([[  1.,  13.,  21.,  11., 196.,  75.,   4.,   3.,  34.,   6.,   7.,
          8.,   0.,   1.,   2.,   3.,   4.,   5.],
       [  3.,  42.,  12.,  33., 766.,  75.,   4.,  55.,   6.,   4.,   3.,
          4.,   5.,   6.,   7.,   0.,  11.,  12.],
       [  1.,  22.,  33.,  11., 999.,  11.,   2.,   1.,  78.,   0.,   1.,
          2.,   9.,   8.,   7.,   1.,  76.,  88.]])

In [170]:
filedata.astype('int32')

array([[  1,  13,  21,  11, 196,  75,   4,   3,  34,   6,   7,   8,   0,
          1,   2,   3,   4,   5],
       [  3,  42,  12,  33, 766,  75,   4,  55,   6,   4,   3,   4,   5,
          6,   7,   0,  11,  12],
       [  1,  22,  33,  11, 999,  11,   2,   1,  78,   0,   1,   2,   9,
          8,   7,   1,  76,  88]])

### Boolean Masking & Advanced Indexing

In [171]:
filedata > 50 #Any value in the dataset greater than 50 is marked as True

array([[False, False, False, False,  True,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

In [172]:
filedata[filedata>50] #Extracts the values that are greater than 50

array([196.,  75., 766.,  75.,  55., 999.,  78.,  76.,  88.])

In [174]:
## You can index a list with NumPy

a = np.array([1,2,3,4,5,6,7,8,9])
a[[1,2,8]]

array([2, 3, 9])

In [175]:
np.any(filedata>50, axis=0) # Validates if any of the values in the column >50

array([False, False, False, False,  True,  True, False,  True,  True,
       False, False, False, False, False, False, False,  True,  True])

In [176]:
np.all(filedata>50, axis=0) # Validates if any of the values in the column >50

array([False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

In [177]:
((filedata>50)&(filedata<100))

array([[False, False, False, False, False,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

In [194]:
(~((filedata>50)&(filedata<100)))

array([[ True,  True,  True,  True,  True, False,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False,  True, False,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True, False,
         True,  True,  True,  True,  True,  True,  True, False, False]])

##### Let's practice some indexing

In [188]:
a = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25],[26,27,28,29,30]])
print(a)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]
 [26 27 28 29 30]]


In [190]:
a[2:4,0:2]

array([[11, 12],
       [16, 17]])

In [192]:
a[[0,1,2,3],[1,2,3,4]]

array([ 2,  8, 14, 20])

In [193]:
a[[0,4,5],3:]

array([[ 4,  5],
       [24, 25],
       [29, 30]])