# NumPy Operations

## Mathematical Operations (statistical functions, arithmetic operations)

One of the key advantages of utilizing the NumPy library is the possibility to perform complex mathematical and statistical operations, given the structure of the ndarray object.
Some of the built-in mathematical functions are discussed in this notebook:


In [1]:
import numpy as np
arr = np.arange(0,10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [2]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [3]:
arr * arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [4]:
arr - arr

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [5]:
# Warning on division by zero, but not an error!
# Just replaced with nan
arr/arr

  This is separate from the ipykernel package so we can avoid doing imports until


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [6]:
# Also warning, but not an error instead infinity
1/arr

  


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

In [7]:
arr**3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

## Universal Array Functions

Numpy comes with many [universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html), which are essentially just mathematical operations you can use to perform the operation across the array. Let's show some common ones:

In [8]:
#Taking Square Roots
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [9]:
#Calcualting exponential (e^)
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [10]:
np.max(arr) #same as arr.max()

9

In [11]:
np.sin(arr)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [12]:
np.cos(arr)

array([ 1.        ,  0.54030231, -0.41614684, -0.9899925 , -0.65364362,
        0.28366219,  0.96017029,  0.75390225, -0.14550003, -0.91113026])

In [13]:
np.tan(arr)

array([ 0.        ,  1.55740772, -2.18503986, -0.14254654,  1.15782128,
       -3.38051501, -0.29100619,  0.87144798, -6.79971146, -0.45231566])

In [14]:
np.log(arr)

  """Entry point for launching an IPython kernel.


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])

In [15]:
#Operation on two arrays
a1 = np.arange(0,10)
a2 = np.arange(10,20)
print("Array 1: ", a1)
print("Array 2: ", a2)

Array 1:  [0 1 2 3 4 5 6 7 8 9]
Array 2:  [10 11 12 13 14 15 16 17 18 19]


In [17]:
#Adding two arrays
np.add(a1, a2)

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

In [18]:
#Subtracting two arrays
np.subtract(a1, a2)

array([-10, -10, -10, -10, -10, -10, -10, -10, -10, -10])

In [19]:
#Multiplication of two arrays
np.multiply(a1,a2)

array([  0,  11,  24,  39,  56,  75,  96, 119, 144, 171])

In [20]:
#Division of two arrays
np.divide(a1, a2)

array([0.        , 0.09090909, 0.16666667, 0.23076923, 0.28571429,
       0.33333333, 0.375     , 0.41176471, 0.44444444, 0.47368421])

### Statistical Calculations
Can perform various statistical operation on an array using numpy's built in functions as shown in examples below:

In [21]:
arr = np.arange(10,50,2)

arr[10]=1000
arr[0]=-999
arr[4]=444
arr

array([-999,   12,   14,   16,  444,   20,   22,   24,   26,   28, 1000,
         32,   34,   36,   38,   40,   42,   44,   46,   48])

In [22]:
np.mean(arr)

48.35

In [23]:
np.median(arr)

33.0

In [24]:
np.max(arr)

1000

In [25]:
np.min(arr)

-999

In [26]:
np.std(arr)

329.1305629989411

In [27]:
np.var(arr)

108326.92749999996

In [28]:
np.percentile(arr,10)

13.8

In [375]:
# anything below Q1 – 1.5 IQR & above Q3 + 1.5 IQR. is an outlier
#IQR=Q3-Q1

In [37]:
def removeOutlier(arr):
    lowerValue=np.percentile(arr,25)
    upperValue=np.percentile(arr,75)
    iqr=upperValue-lowerValue
    arr=arr[(arr>(lowerValue-(1.5*iqr)))&(arr<(upperValue+(1.5*iqr)))]
    return arr

In [38]:
print(arr)
ans=removeOutlier(arr)
ans

[-999   12   14   16  444   20   22   24   26   28 1000   32   34   36
   38   40   42   44   46   48]


array([12, 14, 16, 20, 22, 24, 26, 28, 32, 34, 36, 38, 40, 42, 44, 46, 48])

### Arithmetic operations on a matrix

In [39]:
np.random.randint?

In [53]:
data=np.random.randint(1,10,(10,6))
data

array([[5, 5, 6, 5, 2, 6],
       [8, 9, 2, 5, 3, 5],
       [5, 3, 2, 2, 8, 9],
       [9, 4, 4, 6, 5, 2],
       [9, 5, 3, 2, 6, 8],
       [2, 7, 7, 6, 8, 1],
       [1, 4, 8, 2, 3, 9],
       [1, 6, 5, 2, 4, 8],
       [2, 3, 7, 9, 2, 8],
       [5, 9, 5, 6, 8, 9]])

In [54]:
data.shape

(10, 6)

In [55]:
np.mean(data)

5.166666666666667

In [56]:
np.mean(data,axis=0)

array([4.7, 5.5, 4.9, 4.5, 4.9, 6.5])

In [57]:
np.median(data,axis=1)

array([5. , 5. , 4. , 4.5, 5.5, 6.5, 3.5, 4.5, 5. , 7. ])

In [58]:
data[:,5]=data[:,2]*data[:,3]

In [59]:
data

array([[ 5,  5,  6,  5,  2, 30],
       [ 8,  9,  2,  5,  3, 10],
       [ 5,  3,  2,  2,  8,  4],
       [ 9,  4,  4,  6,  5, 24],
       [ 9,  5,  3,  2,  6,  6],
       [ 2,  7,  7,  6,  8, 42],
       [ 1,  4,  8,  2,  3, 16],
       [ 1,  6,  5,  2,  4, 10],
       [ 2,  3,  7,  9,  2, 63],
       [ 5,  9,  5,  6,  8, 30]])

In [60]:
data=np.random.randint(1,10,(12,3))
data.shape

(12, 3)

In [61]:
data

array([[8, 5, 4],
       [5, 8, 3],
       [5, 6, 1],
       [5, 4, 9],
       [4, 1, 6],
       [5, 9, 7],
       [3, 2, 1],
       [8, 2, 6],
       [3, 8, 2],
       [4, 7, 9],
       [5, 1, 9],
       [1, 2, 8]])

In [62]:
np.corrcoef(data,rowvar=False)

array([[ 1.        ,  0.05183644, -0.01974111],
       [ 0.05183644,  1.        , -0.24477553],
       [-0.01974111, -0.24477553,  1.        ]])

# Shallow and deep copy

A shallow copy means constructing a new collection object and then populating it with references to the child objects found in the original. In essence, a shallow copy is only one level deep. The copying process does not recurse and therefore won’t create copies of the child objects themselves.

A deep copy makes the copying process recursive. It means first constructing a new collection object and then recursively populating it with copies of the child objects found in the original. Copying an object this way walks the whole object tree to create a fully independent clone of the original object and all of its children.

In [63]:
xs = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ys=xs


In [64]:
ys.append([10,11,12])
ys

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [65]:
xs

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [66]:
#copy
import copy
xs = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ys=xs.copy()
ys.append([10,11,12])
print(ys)
print(xs)

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


In [72]:
#shallow copy
xs = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ys = copy.copy(xs)

In [73]:
ys.append([10,11,12])
print(ys)
print(xs)

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


In [74]:
ys[0][0]=(['new sublist'])
print(ys)
print(xs)

[[['new sublist'], 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
[[['new sublist'], 2, 3], [4, 5, 6], [7, 8, 9]]


In [75]:
#deep copy
xs = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
ys = copy.deepcopy(xs)

In [76]:
xs[0][0]=(['new sublist'])
print(xs)
print(ys)

[[['new sublist'], 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


# broadcasting

In [77]:
a=[ 1,  2,  0]

In [78]:
print(type(a))
print(a*2)
print(a+10)

<class 'list'>
[1, 2, 0, 1, 2, 0]


TypeError: can only concatenate list (not "int") to list

In [79]:
a = np.array([1,2,3,4]) 
b = np.array([10,20,30,40]) 
c = a * b 
c

array([ 10,  40,  90, 160])

In [80]:
a.dot(b)

300

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

In [81]:
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 

In [82]:
a

array([[ 0.,  0.,  0.],
       [10., 10., 10.],
       [20., 20., 20.],
       [30., 30., 30.]])

In [84]:
b=np.array([1,2,3])
b

array([1, 2, 3])

In [88]:
b=np.array([[1,2,3],[1,2,3],[1,2,3],[1,2,3]])
b

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [86]:
a.dot(b)

array([  0.,  60., 120., 180.])

A.shape = (2 x 3)

b.shape = (3)

becomes

A.shape = (2 x 3)

b.shape = (1 x 3)

In [89]:
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 
b = np.array([1.0,2.0,3.0]) 

In [90]:
a

array([[ 0.,  0.,  0.],
       [10., 10., 10.],
       [20., 20., 20.],
       [30., 30., 30.]])

In [91]:
b

array([1., 2., 3.])

In [92]:
a+b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

In [93]:
A = np.array([[1, 2, 3], [1, 2, 3]])
print(A.shape)
b = np.array([1, 2])
print(b.shape)
C = A + b
print(C)

(2, 3)
(2,)


ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

A.shape = (2 x 3)

b.shape = (1 x 2)

In [95]:
# create tensor
from numpy import array
T = array([
  [[1,2,3],    [4,5,6],    [7,8,9]],
  [[11,12,13], [14,15,16], [17,18,19]],
  [[21,22,23], [24,25,26], [27,28,29]],
  ])
print(T.shape)
print(T)

(3, 3, 3)
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[11 12 13]
  [14 15 16]
  [17 18 19]]

 [[21 22 23]
  [24 25 26]
  [27 28 29]]]


In [96]:
U=array([
  [[1,2,3],    [4,5,6],    [7,8,9]]])
print(U.shape)
print(U)

(1, 3, 3)
[[[1 2 3]
  [4 5 6]
  [7 8 9]]]


In [97]:
print(T+U)

[[[ 2  4  6]
  [ 8 10 12]
  [14 16 18]]

 [[12 14 16]
  [18 20 22]
  [24 26 28]]

 [[22 24 26]
  [28 30 32]
  [34 36 38]]]


# sorting

In [98]:

a = np.array([[3,7,9,1]]) 

print(a)


print(np.sort(a))

[[3 7 9 1]]
[[1 3 7 9]]


In [104]:
x = np.array([0.20, 0.79, 0.01]) 
print(np.argsort(-x)+1)
print(np.argsort(x))

[2 1 3]
[2 0 1]


### Stacking

Let's some more and a bit advanced methods available in NumPy.

Stacking: Several arrays can be stacked together along different axes. 
- np.vstack: To stack arrays along vertical axis.
- np.hstack: To stack arrays along horizontal axis.
- np.concatenate: To stack arrays along specified axis (axis is passed as argument).

In [105]:
import numpy as np 
  
a = np.array([[1, 2], 
              [3, 4]]) 
  
b = np.array([[5, 6], 
              [7, 8]]) 
  
# vertical stacking 
print("Vertical stacking:\n", np.vstack((a, b))) 
  
# horizontal stacking 
print("\nHorizontal stacking:\n", np.hstack((a, b))) 
  
  

Vertical stacking:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Horizontal stacking:
 [[1 2 5 6]
 [3 4 7 8]]


In [108]:

# concatenation method  
print("\nConcatenating to 1st axis:\n", np.concatenate((a, b), 0)) 


Concatenating to 1st axis:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [109]:

# concatenation method  
print("\nConcatenating to 2nd axis:\n", np.concatenate((a, b), 1)) 


Concatenating to 2nd axis:
 [[1 2 5 6]
 [3 4 7 8]]


### Splitting

For splitting, we have these functions:
- np.hsplit: Split array along horizontal axis.
- np.vsplit: Split array along vertical axis.

In [111]:
import numpy as np 
  
a = np.array([[1, 3, 5, 7, 9, 11], 
              [2, 4, 6, 8, 10, 12]]) 
  
# horizontal splitting
  
x,y=np.hsplit(a, 2)
x,y

(array([[1, 3, 5],
        [2, 4, 6]]), array([[ 7,  9, 11],
        [ 8, 10, 12]]))

In [112]:
# vertical splitting 
print("\nSplitting along vertical axis into 2 parts:\n", np.vsplit(a, 2))


Splitting along vertical axis into 2 parts:
 [array([[ 1,  3,  5,  7,  9, 11]]), array([[ 2,  4,  6,  8, 10, 12]])]


### Working with datetime
Numpy has core array data types which natively support datetime functionality. The data type is called “datetime64”, so named because “datetime” is already taken by the datetime library included in Python.
Consider the example below for some examples:

In [114]:
# creating a date 
date = np.datetime64('2017-02-12') 
print("Date is:", date) 
print("Year is:", np.datetime64(date, 'Y')) 
print("Month-Year is:", np.datetime64(date, 'M'))
  

Date is: 2017-02-12
Year is: 2017
Month-Year is: 2017-02


In [116]:

# creating array of dates in a month 
dates = np.arange('2017-02', '2017-03', dtype='datetime64[D]') 
print("\nDates of February, 2017:\n", dates) 
  


Dates of February, 2017:
 ['2017-02-01' '2017-02-02' '2017-02-03' '2017-02-04' '2017-02-05'
 '2017-02-06' '2017-02-07' '2017-02-08' '2017-02-09' '2017-02-10'
 '2017-02-11' '2017-02-12' '2017-02-13' '2017-02-14' '2017-02-15'
 '2017-02-16' '2017-02-17' '2017-02-18' '2017-02-19' '2017-02-20'
 '2017-02-21' '2017-02-22' '2017-02-23' '2017-02-24' '2017-02-25'
 '2017-02-26' '2017-02-27' '2017-02-28']


In [118]:

# arithmetic operation on dates 
dur = np.datetime64('2017-05-22') - np.datetime64('2016-05-22') 
print("\nNo. of days:", dur) 
print("No. of weeks:", np.timedelta64(dur, 'W')) 
  



No. of days: 365 days
No. of weeks: 52 weeks


In [119]:
# sorting dates 
a = np.array(['2017-02-12', '2016-10-13', '2019-05-22'], dtype='datetime64') 
print("\nDates in sorted order:", np.sort(a))


Dates in sorted order: ['2016-10-13' '2017-02-12' '2019-05-22']
