<a href="https://colab.research.google.com/github/sajid-munawar/Pandas-a-versatile-and-high-performance-Python-library-for-data-manipulation-analysis-and-discover/blob/main/Chapter_3_Numpy_for_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Chapter_3_Numpy_for_Pandas

In [2]:
#Importing numpy
import numpy as np

Benefits and characteristics of Numpy arrays


*   Contiguous allocation in memory
*   Vectorized Operations
*   Boolean Selection
*   Sliceability



In [3]:
# A function that sequires all the values in a sequence
def squares(values):
  result=[]
  for v in result:
    result.append(v*v)
  return result
to_squares=range(100000)
%timeit squares(to_squares)

The slowest run took 17.96 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 177 ns per loop


In [4]:
# Using Numpy same function
values_to_square=np.arange(0,100000)
%timeit values_to_square**2

The slowest run took 35.29 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 126 µs per loop


In [5]:
# Creating Numpy arrays and performing basic array operation

In [6]:
# a simple array
a1=np.array([1,2,3,4,5])
a1

array([1, 2, 3, 4, 5])

In [7]:
type(a1)

numpy.ndarray

In [8]:
np.size(a1)

5

In [9]:
a1.dtype

dtype('int64')

In [10]:
# any one float value makes all array float
a2=np.array([1,2,3,4,5.5])
a2

array([1. , 2. , 3. , 4. , 5.5])

In [11]:
a2.dtype

dtype('float64')

In [12]:
a2=a2.astype('int64')

In [13]:
a2

array([1, 2, 3, 4, 5])

In [14]:
# shortend to repeat a sequence of numbers
a3=np.array([1]*10)
a3

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [15]:
# convert a python range to numpy array
a4=np.array(range(10))
a4

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]:
# create an array of zeros
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [17]:
# force it to be int instead of float
np.zeros(10,dtype="int64")

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [18]:
# make a range starting at zero with 10
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
# generate of even numbers between 0 and 10
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [20]:
# counting down
np.arange(10,0,-1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [21]:
# evenly spaced values between two intervals
np.linspace(0,10,11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [22]:
# multiple an array with 2
a1=np.arange(10)
print(a1)
a1*2

[0 1 2 3 4 5 6 7 8 9]


array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [23]:
# mathametical operation between two arrays
a2=np.arange(10,20)
a1+a2

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [24]:
# create a two dimensional array (2*2)
a1=np.array([[1,2],[3,4]])
a1

array([[1, 2],
       [3, 4]])

In [25]:
# a more efficient way to create or convert an array into two dimensional is np.reshape method
m=np.arange(0,20).reshape(5,4)

In [26]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [27]:
# size of any dimensional array is the number of elements
np.size(m)

20

In [28]:
# can ask size along a given axis (0 in rows)
np.size(m,0)

5

In [29]:
np.size(m,1)

4

**Selecting array elements**

In [30]:
a1=np.arange(10)

In [31]:
a1[0]

0

In [32]:
a1[0],a1[2]

(0, 2)

In [33]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [34]:
m[0]

array([0, 1, 2, 3])

In [35]:
# select an elemnt at row 1 and column 2
m[1,2]

6

In [36]:
# all items in a row
m[0,]

array([0, 1, 2, 3])

In [37]:
# all the items in column 2
m[:,2]

array([ 2,  6, 10, 14, 18])

**Logical operations on arrays**

In [38]:
# which items are less than two
a=np.arange(5)
a<2

array([ True,  True, False, False, False])

In [39]:
# this is commented as it will cause an exception
# print(a<2 or a>3)

In [40]:
(a<2) | (a>3)

array([ True,  True, False, False,  True])

In [41]:
a

array([0, 1, 2, 3, 4])

In [42]:
def exp(x):
  return x<3 or x>3

In [43]:
#  # np.vectorize applies the method to all items in an array
np.vectorize(exp)(a)

array([ True,  True,  True, False,  True])

Boolean selection

In [44]:
r=a<3
print(r)
a[r]

[ True  True  True False False]


array([0, 1, 2])

In [45]:
# np.sum counts True as 1 and False as 0
# so this is how many items are less than 3
np.sum(a<3)

3

In [46]:
# this can be apply across two arrays
a1=np.arange(0,5)
a2=np.arange(5,0,-1)
a1<a2

array([ True,  True,  True, False, False])

In [47]:
# this also work as multidimensional arrays
a1=np.arange(0,9).reshape(3,3)
a2=np.arange(9,0,-1).reshape(3,3)

a1<a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

**`Slicing arrays`**

 start:end:step

In [48]:
# get all items in the array from position 3 up to position 8 (but not inclusive)
a1=np.arange(1,10)
print(a1)
a1[3:8]

[1 2 3 4 5 6 7 8 9]


array([4, 5, 6, 7, 8])

In [49]:
# every other item
a1[::2]

array([1, 3, 5, 7, 9])

In [50]:
# in reverse order
a1[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1])

In [51]:
  # that is, there is no 1 printed in this
  a1[9:0:-1]

array([9, 8, 7, 6, 5, 4, 3, 2])

In [52]:
# all items from position 5 onwards
a1[5:]

array([6, 7, 8, 9])

In [53]:
# the item in the first 5 positions
a1[:5]

array([1, 2, 3, 4, 5])

Two-dimensional arrays can also be sliced

In [54]:
# get items in column position 1, all row
m[:,1]

array([ 1,  5,  9, 13, 17])

In [55]:
# in all rows, but for all columns in positions 1 up to but not including 3
m[:,1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14],
       [17, 18]])

In [56]:
# row positions 3 up to but not including 5, all columns
m[3:5,:]

array([[12, 13, 14, 15],
       [16, 17, 18, 19]])

In [57]:
m[3:5,]

array([[12, 13, 14, 15],
       [16, 17, 18, 19]])

Both columns and rows can be sliced at the same time:
 combined to pull out a sub matrix of the matrix

In [58]:
m[1:3,1:3]

array([[ 5,  6],
       [ 9, 10]])

 using a python array, we can select
 non-contiguous rows or columns

In [59]:
m[[1,3,4],:]

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

#**Reshaping arrays**

create a 9 element array (1x9)

In [60]:
a1=np.arange(0,9)

reshape to a 3x3 2-d array

In [61]:
m=a1.reshape(3,3)
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

we can reshape downward in dimensions too

In [62]:
m2=m.reshape(9)
m2

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

.ravel will generate array representing a flattened 2-d array

In [63]:
print(m)
ravaled=m.ravel()
ravaled

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([0, 1, 2, 3, 4, 5, 6, 7, 8])

it does not alter the shape of the source

In [64]:
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Even though .reshape() and .ravel() do not change the shape of the original
array or matrix, they do actually return a one-dimensional view into the specified
array or matrix. If you change an element in this view, the value in the original array
or matrix is changed. The following example demonstrates this ability to change
items of the original matrix through the view:

In [65]:
reshaped=m.reshape(np.size(m))
raveled=m.ravel()
print(raveled,'\n',reshaped)

[0 1 2 3 4 5 6 7 8] 
 [0 1 2 3 4 5 6 7 8]


In [66]:
reshaped[2]=1000
raveled[5]=2000
m

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

The .flatten() method functions similarly to .ravel() but instead returns a new
array with copied data instead of a view. Changes to the result do not change the
original matrix:

In [67]:
flattened=m.flatten()
flattened

array([   0,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [68]:
flattened[0]='5000'

In [69]:
flattened

array([5000,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [70]:
m

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

The .shape property returns a tuple representing the shape of the array:

In [71]:
m.shape

(3, 3)

In [72]:
flattened.shape

(9,)

we can reshape by assigning a tuple to the .shape property  we start with this, which has one dimension

In [73]:
flattened.shape=(3,3)

In [74]:
flattened

array([[5000,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

transpose a matrix

In [75]:
flattened.transpose()

array([[5000,    3,    6],
       [   1,    4,    7],
       [1000, 2000,    8]])

In [76]:
flattened

array([[5000,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

can also use .T property to transpose

In [77]:
flattened.T

array([[5000,    3,    6],
       [   1,    4,    7],
       [1000, 2000,    8]])

The .resize() method functions similarly to the .reshape() method, except
that while reshaping returns a new array with data copied into it, .resize()
performs an in-place reshaping of the array.:

In [78]:
# we can also use .resize, which changes shape of an object in-place
print(m)
m.resize(1,9)
m

[[   0    1 1000]
 [   3    4 2000]
 [   6    7    8]]


array([[   0,    1, 1000,    3,    4, 2000,    6,    7,    8]])

shape has been changed

#**Combining arrays**

creating two arrays for example

In [79]:
a=np.arange(9).reshape(3,3)
b=(a+1)*10
print(a,'\n\n',b)

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

 [[10 20 30]
 [40 50 60]
 [70 80 90]]


horizontal stacking

In [80]:
np.hstack((a,b))

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

This functionally is equivalent to using the np.concatenate() function while
specifying axis = 1 :

In [81]:
np.concatenate((a,b),axis=1)

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

vertical stacking

In [82]:
np.vstack((a,b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Like np.hstack() , this is equivalent to using the concatenate function, except
specifying axis=0 :

In [83]:
np.concatenate((a,b),axis=0)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Depth stacking

dstack stacks each independent column of a and b

In [84]:
np.dstack((a,b))

array([[[ 0, 10],
        [ 1, 20],
        [ 2, 30]],

       [[ 3, 40],
        [ 4, 50],
        [ 5, 60]],

       [[ 6, 70],
        [ 7, 80],
        [ 8, 90]]])

Column stacking

In [85]:
a=np.arange(5)
b=np.arange(5,10)
a.size==b.size

True

In [86]:
np.column_stack((a,b))

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

Rows stack

In [87]:
np.row_stack((a,b))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

#**Splitting Arrays**

Horizontal split 

In [88]:
a=np.arange(12).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [89]:
np.hsplit(a,4)

[array([[0],
        [4],
        [8]]), array([[1],
        [5],
        [9]]), array([[ 2],
        [ 6],
        [10]]), array([[ 3],
        [ 7],
        [11]])]

In [90]:
np.hsplit(a,2)

[array([[0, 1],
        [4, 5],
        [8, 9]]), array([[ 2,  3],
        [ 6,  7],
        [10, 11]])]

In [91]:
np.hsplit(a,[3])

[array([[ 0,  1,  2],
        [ 4,  5,  6],
        [ 8,  9, 10]]), array([[ 3],
        [ 7],
        [11]])]

split at columns 1,3


In [92]:
np.hsplit(a,[1,3])

[array([[0],
        [4],
        [8]]), array([[ 1,  2],
        [ 5,  6],
        [ 9, 10]]), array([[ 3],
        [ 7],
        [11]])]

np.split()

In [93]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [94]:
np.split(a,2,axis=1)

[array([[0, 1],
        [4, 5],
        [8, 9]]), array([[ 2,  3],
        [ 6,  7],
        [10, 11]])]

In [95]:
np.split(a,3,axis=0)

[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8,  9, 10, 11]])]

Vertical split

In [96]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [97]:
np.vsplit(a,3)

[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8,  9, 10, 11]])]

In [98]:
np.vsplit(a,[0,2])

[array([], shape=(0, 4), dtype=int64), array([[0, 1, 2, 3],
        [4, 5, 6, 7]]), array([[ 8,  9, 10, 11]])]

In [100]:
# 3-d array
c=np.arange(27).reshape(3,3,3)
c

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

split into 3

In [101]:
np.dsplit(c,3)

[array([[[ 0],
         [ 3],
         [ 6]],
 
        [[ 9],
         [12],
         [15]],
 
        [[18],
         [21],
         [24]]]), array([[[ 1],
         [ 4],
         [ 7]],
 
        [[10],
         [13],
         [16]],
 
        [[19],
         [22],
         [25]]]), array([[[ 2],
         [ 5],
         [ 8]],
 
        [[11],
         [14],
         [17]],
 
        [[20],
         [23],
         [26]]])]

#**Useful numerical methods of NumPy arrays**

demonstrate some of the properties of NumPy arrays


In [105]:
m = np.arange(10, 19).reshape(3, 3)
print (m)
print ("{0} min of the entire matrix".format(m.min()))
print ("{0} max of entire matrix".format(m.max()))
print ("{0} position of the min value".format(m.argmin()))
print ("{0} position of the max value".format(m.argmax()))
print ("{0} mins down each column".format(m.min(axis = 0)))
print ("{0} mins across each row".format(m.min(axis = 1)))
print ("{0} maxs down each column".format(m.max(axis = 0)))
print ("{0} maxs across each row".format(m.max(axis = 1)))

[[10 11 12]
 [13 14 15]
 [16 17 18]]
10 min of the entire matrix
18 max of entire matrix
0 position of the min value
8 position of the max value
[10 11 12] mins down each column
[10 13 16] mins across each row
[16 17 18] maxs down each column
[12 15 18] maxs across each row


The .mean() , .std() , and .var() methods compute the mathematical mean,
standard deviation, and variance of the values in an array:

In [106]:
# demonstrate included statistical methods

In [107]:
a=np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [108]:
a.mean(),a.std(),a.var()

(4.5, 2.8722813232690143, 8.25)

demonstrate sum and prod

In [111]:
a=np.arange(1,5)
a

array([1, 2, 3, 4])

In [112]:
a.sum(),a.prod()

(10, 24)

In [113]:
a # and cumulative sum and prod

array([1, 2, 3, 4])

In [114]:
a.cumsum(),a.cumprod()

(array([ 1,  3,  6, 10]), array([ 1,  2,  6, 24]))

applying logical operators

In [115]:
a=np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [117]:
(a<5).any() # any < 5?

True

In [118]:
(a<5).all() # all < 5? (a < 5).any() # any < 5?  

False

size is always the total number of elements

In [119]:
np.arange(10).reshape(2,5).size

10

.ndim will give you the total # of dimensions

In [120]:
np.arange(10).reshape(2,5).ndim

2