<a href="https://colab.research.google.com/github/sajid-munawar/Pandas-a-versatile-and-high-performance-Python-library-for-data-manipulation-analysis-and-discover/blob/main/Chapter_3_Numpy_for_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Chapter_3_Numpy_for_Pandas

In [1]:
#Importing numpy
import numpy as np

Benefits and characteristics of Numpy arrays


*   Contiguous allocation in memory
*   Vectorized Operations
*   Boolean Selection
*   Sliceability



In [2]:
# A function that sequires all the values in a sequence
def squares(values):
  result=[]
  for v in result:
    result.append(v*v)
  return result
to_squares=range(100000)
%timeit squares(to_squares)

The slowest run took 7.04 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 184 ns per loop


In [3]:
# Using Numpy same function
values_to_square=np.arange(0,100000)
%timeit values_to_square**2

The slowest run took 35.43 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 128 µs per loop


In [4]:
# Creating Numpy arrays and performing basic array operation

In [5]:
# a simple array
a1=np.array([1,2,3,4,5])
a1

array([1, 2, 3, 4, 5])

In [6]:
type(a1)

numpy.ndarray

In [7]:
np.size(a1)

5

In [8]:
a1.dtype

dtype('int64')

In [13]:
# any one float value makes all array float
a2=np.array([1,2,3,4,5.5])
a2

array([1. , 2. , 3. , 4. , 5.5])

In [14]:
a2.dtype

dtype('float64')

In [17]:
a2=a2.astype('int64')

In [18]:
a2

array([1, 2, 3, 4, 5])

In [19]:
# shortend to repeat a sequence of numbers
a3=np.array([1]*10)
a3

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [20]:
# convert a python range to numpy array
a4=np.array(range(10))
a4

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
# create an array of zeros
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
# force it to be int instead of float
np.zeros(10,dtype="int64")

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [25]:
# make a range starting at zero with 10
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [27]:
# generate of even numbers between 0 and 10
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [28]:
# counting down
np.arange(10,0,-1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

In [29]:
# evenly spaced values between two intervals
np.linspace(0,10,11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [30]:
# multiple an array with 2
a1=np.arange(10)
print(a1)
a1*2

[0 1 2 3 4 5 6 7 8 9]


array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [31]:
# mathametical operation between two arrays
a2=np.arange(10,20)
a1+a2

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [34]:
# create a two dimensional array (2*2)
a1=np.array([[1,2],[3,4]])
a1

array([[1, 2],
       [3, 4]])

In [35]:
# a more efficient way to create or convert an array into two dimensional is np.reshape method
m=np.arange(0,20).reshape(5,4)

In [36]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [38]:
# size of any dimensional array is the number of elements
np.size(m)

20

In [39]:
# can ask size along a given axis (0 in rows)
np.size(m,0)

5

In [40]:
np.size(m,1)

4

**Selecting array elements**

In [44]:
a1=np.arange(10)

In [46]:
a1[0]

0

In [47]:
a1[0],a1[2]

(0, 2)

In [48]:
m

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [49]:
m[0]

array([0, 1, 2, 3])

In [50]:
# select an elemnt at row 1 and column 2
m[1,2]

6

In [51]:
# all items in a row
m[0,]

array([0, 1, 2, 3])

In [52]:
# all the items in column 2
m[:,2]

array([ 2,  6, 10, 14, 18])

**Logical operations on arrays**

In [55]:
# which items are less than two
a=np.arange(5)
a<2

array([ True,  True, False, False, False])

In [58]:
# this is commented as it will cause an exception
# print(a<2 or a>3)

In [57]:
(a<2) | (a>3)

array([ True,  True, False, False,  True])

In [59]:
a

array([0, 1, 2, 3, 4])

In [60]:
def exp(x):
  return x<3 or x>3

In [62]:
#  # np.vectorize applies the method to all items in an array
np.vectorize(exp)(a)

array([ True,  True,  True, False,  True])

Boolean selection

In [64]:
r=a<3
print(r)
a[r]

[ True  True  True False False]


array([0, 1, 2])

In [66]:
# np.sum counts True as 1 and False as 0
# so this is how many items are less than 3
np.sum(a<3)

3

In [67]:
# this can be apply across two arrays
a1=np.arange(0,5)
a2=np.arange(5,0,-1)
a1<a2

array([ True,  True,  True, False, False])

In [71]:
# this also work as multidimensional arrays
a1=np.arange(0,9).reshape(3,3)
a2=np.arange(9,0,-1).reshape(3,3)

a1<a2

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])