<a href="https://colab.research.google.com/github/sajid-munawar/Pandas-a-versatile-and-high-performance-Python-library-for-data-manipulation-analysis-and-discover/blob/main/Chapter_3_Numpy_for_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Chapter_3_Numpy_for_Pandas

In [236]:
#Importing numpy
import numpy as np

Benefits and characteristics of Numpy arrays


*   Contiguous allocation in memory
*   Vectorized Operations
*   Boolean Selection
*   Sliceability



In [237]:
# A function that sequires all the values in a sequence
def squares(values):
  result=[]
  for v in result:
    result.append(v*v)
  return result
to_squares=range(100000)
%timeit squares(to_squares)

The slowest run took 16.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 179 ns per loop


In [238]:
# Using Numpy same function
values_to_square=np.arange(0,100000)
%timeit values_to_square**2

10000 loops, best of 5: 139 µs per loop


In [239]:
# Creating Numpy arrays and performing basic array operation

In [240]:
# a simple array
a1=np.array([1,2,3,4,5])
a1

array([1, 2, 3, 4, 5])

In [241]:
type(a1)

numpy.ndarray

In [242]:
np.size(a1)

5

In [243]:
a1.dtype

dtype('int64')

In [244]:
# any one float value makes all array float
a2=np.array([1,2,3,4,5.5])
a2

array([1. , 2. , 3. , 4. , 5.5])

In [245]:
a2.dtype

dtype('float64')

In [246]:
a2=a2.astype('int64')

In [247]:
a2

array([1, 2, 3, 4, 5])

In [248]:
# shortend to repeat a sequence of numbers
a3=np.array([1]*10)
a3

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [249]:
# convert a python range to numpy array
a4=np.array(range(10))
a4

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [250]:
# create an array of zeros
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [251]:
# force it to be int instead of float
np.zeros(10,dtype="int64")

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [252]:
# make a range starting at zero with 10
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [253]:
# generate of even numbers between 0 and 10
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [254]:
# counting down
np.arange(10,0,-1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [255]:
# evenly spaced values between two intervals
np.linspace(0,10,11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [256]:
# multiple an array with 2
a1=np.arange(10)
print(a1)
a1*2

[0 1 2 3 4 5 6 7 8 9]


array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [257]:
# mathametical operation between two arrays
a2=np.arange(10,20)
a1+a2

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [258]:
# create a two dimensional array (2*2)
a1=np.array([[1,2],[3,4]])
a1

array([[1, 2],
       [3, 4]])

In [259]:
# a more efficient way to create or convert an array into two dimensional is np.reshape method
m=np.arange(0,20).reshape(5,4)

In [260]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [261]:
# size of any dimensional array is the number of elements
np.size(m)

20

In [262]:
# can ask size along a given axis (0 in rows)
np.size(m,0)

5

In [263]:
np.size(m,1)

4

**Selecting array elements**

In [264]:
a1=np.arange(10)

In [265]:
a1[0]

0

In [266]:
a1[0],a1[2]

(0, 2)

In [267]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [268]:
m[0]

array([0, 1, 2, 3])

In [269]:
# select an elemnt at row 1 and column 2
m[1,2]

6

In [270]:
# all items in a row
m[0,]

array([0, 1, 2, 3])

In [271]:
# all the items in column 2
m[:,2]

array([ 2,  6, 10, 14, 18])

**Logical operations on arrays**

In [272]:
# which items are less than two
a=np.arange(5)
a<2

array([ True,  True, False, False, False])

In [273]:
# this is commented as it will cause an exception
# print(a<2 or a>3)

In [274]:
(a<2) | (a>3)

array([ True,  True, False, False,  True])

In [275]:
a

array([0, 1, 2, 3, 4])

In [276]:
def exp(x):
  return x<3 or x>3

In [277]:
#  # np.vectorize applies the method to all items in an array
np.vectorize(exp)(a)

array([ True,  True,  True, False,  True])

Boolean selection

In [278]:
r=a<3
print(r)
a[r]

[ True  True  True False False]


array([0, 1, 2])

In [279]:
# np.sum counts True as 1 and False as 0
# so this is how many items are less than 3
np.sum(a<3)

3

In [280]:
# this can be apply across two arrays
a1=np.arange(0,5)
a2=np.arange(5,0,-1)
a1<a2

array([ True,  True,  True, False, False])

In [281]:
# this also work as multidimensional arrays
a1=np.arange(0,9).reshape(3,3)
a2=np.arange(9,0,-1).reshape(3,3)

a1<a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

**`Slicing arrays`**

 start:end:step

In [282]:
# get all items in the array from position 3 up to position 8 (but not inclusive)
a1=np.arange(1,10)
print(a1)
a1[3:8]

[1 2 3 4 5 6 7 8 9]


array([4, 5, 6, 7, 8])

In [283]:
# every other item
a1[::2]

array([1, 3, 5, 7, 9])

In [284]:
# in reverse order
a1[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1])

In [285]:
  # that is, there is no 1 printed in this
  a1[9:0:-1]

array([9, 8, 7, 6, 5, 4, 3, 2])

In [286]:
# all items from position 5 onwards
a1[5:]

array([6, 7, 8, 9])

In [287]:
# the item in the first 5 positions
a1[:5]

array([1, 2, 3, 4, 5])

Two-dimensional arrays can also be sliced

In [288]:
# get items in column position 1, all row
m[:,1]

array([ 1,  5,  9, 13, 17])

In [289]:
# in all rows, but for all columns in positions 1 up to but not including 3
m[:,1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14],
       [17, 18]])

In [290]:
# row positions 3 up to but not including 5, all columns
m[3:5,:]

array([[12, 13, 14, 15],
       [16, 17, 18, 19]])

In [291]:
m[3:5,]

array([[12, 13, 14, 15],
       [16, 17, 18, 19]])

Both columns and rows can be sliced at the same time:
 combined to pull out a sub matrix of the matrix

In [292]:
m[1:3,1:3]

array([[ 5,  6],
       [ 9, 10]])

 using a python array, we can select
 non-contiguous rows or columns

In [293]:
m[[1,3,4],:]

array([[ 4,  5,  6,  7],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

#**Reshaping arrays**

create a 9 element array (1x9)

In [294]:
a1=np.arange(0,9)

reshape to a 3x3 2-d array

In [295]:
m=a1.reshape(3,3)
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

we can reshape downward in dimensions too

In [296]:
m2=m.reshape(9)
m2

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

.ravel will generate array representing a flattened 2-d array

In [297]:
print(m)
ravaled=m.ravel()
ravaled

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([0, 1, 2, 3, 4, 5, 6, 7, 8])

it does not alter the shape of the source

In [298]:
m

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Even though .reshape() and .ravel() do not change the shape of the original
array or matrix, they do actually return a one-dimensional view into the specified
array or matrix. If you change an element in this view, the value in the original array
or matrix is changed. The following example demonstrates this ability to change
items of the original matrix through the view:

In [299]:
reshaped=m.reshape(np.size(m))
raveled=m.ravel()
print(raveled,'\n',reshaped)

[0 1 2 3 4 5 6 7 8] 
 [0 1 2 3 4 5 6 7 8]


In [300]:
reshaped[2]=1000
raveled[5]=2000
m

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

The .flatten() method functions similarly to .ravel() but instead returns a new
array with copied data instead of a view. Changes to the result do not change the
original matrix:

In [301]:
flattened=m.flatten()
flattened

array([   0,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [302]:
flattened[0]='5000'

In [303]:
flattened

array([5000,    1, 1000,    3,    4, 2000,    6,    7,    8])

In [304]:
m

array([[   0,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

The .shape property returns a tuple representing the shape of the array:

In [305]:
m.shape

(3, 3)

In [306]:
flattened.shape

(9,)

we can reshape by assigning a tuple to the .shape property  we start with this, which has one dimension

In [307]:
flattened.shape=(3,3)

In [308]:
flattened

array([[5000,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

transpose a matrix

In [309]:
flattened.transpose()

array([[5000,    3,    6],
       [   1,    4,    7],
       [1000, 2000,    8]])

In [310]:
flattened

array([[5000,    1, 1000],
       [   3,    4, 2000],
       [   6,    7,    8]])

can also use .T property to transpose

In [311]:
flattened.T

array([[5000,    3,    6],
       [   1,    4,    7],
       [1000, 2000,    8]])

The .resize() method functions similarly to the .reshape() method, except
that while reshaping returns a new array with data copied into it, .resize()
performs an in-place reshaping of the array.:

In [312]:
# we can also use .resize, which changes shape of an object in-place
print(m)
m.resize(1,9)
m

[[   0    1 1000]
 [   3    4 2000]
 [   6    7    8]]


array([[   0,    1, 1000,    3,    4, 2000,    6,    7,    8]])

shape has been changed

#**Combining arrays**

creating two arrays for example

In [317]:
a=np.arange(9).reshape(3,3)
b=(a+1)*10
print(a,'\n\n',b)

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

 [[10 20 30]
 [40 50 60]
 [70 80 90]]


horizontal stacking

In [319]:
np.hstack((a,b))

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

This functionally is equivalent to using the np.concatenate() function while
specifying axis = 1 :

In [322]:
np.concatenate((a,b),axis=1)

array([[ 0,  1,  2, 10, 20, 30],
       [ 3,  4,  5, 40, 50, 60],
       [ 6,  7,  8, 70, 80, 90]])

vertical stacking

In [323]:
np.vstack((a,b))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Like np.hstack() , this is equivalent to using the concatenate function, except
specifying axis=0 :

In [324]:
np.concatenate((a,b),axis=0)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

Depth stacking

dstack stacks each independent column of a and b

In [325]:
np.dstack((a,b))

array([[[ 0, 10],
        [ 1, 20],
        [ 2, 30]],

       [[ 3, 40],
        [ 4, 50],
        [ 5, 60]],

       [[ 6, 70],
        [ 7, 80],
        [ 8, 90]]])

Column stacking

In [326]:
a=np.arange(5)
b=np.arange(5,10)
a.size==b.size

True

In [328]:
np.column_stack((a,b))

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

Rows stack

In [329]:
np.row_stack((a,b))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])