## Numpy Statistical Functions

### Spread

#### Mean Absolute Deviation

##### <center>Average of Absolute Deviations</center>
### <center>Formula</center>
## $$MAD = \frac{\sum_{i=1}^{N} |x_i - \mu|}{N}$$
<center><i> N: no.of observations <br> </i></center>
<center><i> $x_i$ : value of ith observation</i></center>
<center><i> $\mu$ : Mean</i></center>

In [1]:
import numpy as np

1D Array

In [2]:
y = np.array([4, 8, 10, 12, 15, 19, 20, 24])

In [3]:
#compute mean
mean= np.mean(y)
mean

14.0

In [4]:
#compute absolute difference
abs_diff = np.abs(y - mean)
abs_diff

array([10.,  6.,  4.,  2.,  1.,  5.,  6., 10.])

In [5]:
#compute MAD
mad = np.mean(abs_diff) 
print('Mean Absolute Deviation:',mad)

Mean Absolute Deviation: 5.5


In [6]:
def compute_mad_1D(arr):
    mean = np.mean(arr)
    abs_diff = np.abs(arr - mean)
    mad  = np.mean(abs_diff) 
    return mad

In [7]:
compute_mad_1D(y)

5.5

2D Array

In [8]:
np.random.seed(5)
x= np.random.randint(low=15,high=100,size=(7,4))
x

array([[93, 76, 31, 88],
       [23, 77, 42, 45],
       [95, 22, 91, 30],
       [68, 95, 42, 59],
       [92, 90, 80, 62],
       [45, 99, 33, 24],
       [56, 77, 16, 97]])

Problem with return structure of numpy statistical functions like np.mean()
and solution for the same

np.mean() always returns in row structure for both axis=0 and axis=1 cases. The subsequent subtraction operation ( x - mean) does not work for axis=1 when row structure is returned

In [9]:
#for axis=0, numpy subtraction works fine
mean = np.mean(x,axis=0)
print('mean')
print(mean)
np.abs(x - mean)

mean
[67.42857143 76.57142857 47.85714286 57.85714286]


array([[25.57142857,  0.57142857, 16.85714286, 30.14285714],
       [44.42857143,  0.42857143,  5.85714286, 12.85714286],
       [27.57142857, 54.57142857, 43.14285714, 27.85714286],
       [ 0.57142857, 18.42857143,  5.85714286,  1.14285714],
       [24.57142857, 13.42857143, 32.14285714,  4.14285714],
       [22.42857143, 22.42857143, 14.85714286, 33.85714286],
       [11.42857143,  0.42857143, 31.85714286, 39.14285714]])

In [10]:
#check one sample column whether it worked
np.abs(x[:,0] - 67.42857143)

array([25.57142857, 44.42857143, 27.57142857,  0.57142857, 24.57142857,
       22.42857143, 11.42857143])

In [11]:
#if you dont set keepdims=True, np.mean still returns single row format
# and (x - mean) operation will throw error
mean = np.mean(x,axis=1, keepdims=True)
print(mean)
np.abs(x - mean)

[[72.  ]
 [46.75]
 [59.5 ]
 [66.  ]
 [81.  ]
 [50.25]
 [61.5 ]]


array([[21.  ,  4.  , 41.  , 16.  ],
       [23.75, 30.25,  4.75,  1.75],
       [35.5 , 37.5 , 31.5 , 29.5 ],
       [ 2.  , 29.  , 24.  ,  7.  ],
       [11.  ,  9.  ,  1.  , 19.  ],
       [ 5.25, 48.75, 17.25, 26.25],
       [ 5.5 , 15.5 , 45.5 , 35.5 ]])

In [12]:
#check one sample row whether it worked
np.abs(x[0,:] - 72)

array([21,  4, 41, 16])

In [13]:
def compute_mad(arr,axis=None):
    # by default numpy statistical functions return in aggregate row format
    # and this can cause problem when axis=1 is used and the result is passed
    # to other numpy operations like arithmetic operations
    
    # to solve the problem "keepdims" argment is set so that the output format
    # will be adjusted according to the axis argment.
    # for axis=1, it will return single column with multiple rows
    mean = np.mean(arr,axis=axis,keepdims=True)
    print('Mean')
    print(mean)
    abs_diff = np.abs(arr-mean)
    print()
    print('Absolute difference')
    print(abs_diff)
    mad  = np.mean(abs_diff,axis=axis) 
    return mad

In [14]:
x

array([[93, 76, 31, 88],
       [23, 77, 42, 45],
       [95, 22, 91, 30],
       [68, 95, 42, 59],
       [92, 90, 80, 62],
       [45, 99, 33, 24],
       [56, 77, 16, 97]])

In [15]:
mad = compute_mad(x,axis=0)
print('\nMean Absolute Deviation:',mad)

Mean
[[67.42857143 76.57142857 47.85714286 57.85714286]]

Absolute difference
[[25.57142857  0.57142857 16.85714286 30.14285714]
 [44.42857143  0.42857143  5.85714286 12.85714286]
 [27.57142857 54.57142857 43.14285714 27.85714286]
 [ 0.57142857 18.42857143  5.85714286  1.14285714]
 [24.57142857 13.42857143 32.14285714  4.14285714]
 [22.42857143 22.42857143 14.85714286 33.85714286]
 [11.42857143  0.42857143 31.85714286 39.14285714]]

Mean Absolute Deviation: [22.36734694 15.75510204 21.51020408 21.30612245]


In [16]:
x

array([[93, 76, 31, 88],
       [23, 77, 42, 45],
       [95, 22, 91, 30],
       [68, 95, 42, 59],
       [92, 90, 80, 62],
       [45, 99, 33, 24],
       [56, 77, 16, 97]])

In [17]:
mad = compute_mad(x,axis=1)
print('\nMean Absolute Deviation:',mad)

Mean
[[72.  ]
 [46.75]
 [59.5 ]
 [66.  ]
 [81.  ]
 [50.25]
 [61.5 ]]

Absolute difference
[[21.    4.   41.   16.  ]
 [23.75 30.25  4.75  1.75]
 [35.5  37.5  31.5  29.5 ]
 [ 2.   29.   24.    7.  ]
 [11.    9.    1.   19.  ]
 [ 5.25 48.75 17.25 26.25]
 [ 5.5  15.5  45.5  35.5 ]]

Mean Absolute Deviation: [20.5   15.125 33.5   15.5   10.    24.375 25.5  ]


In [18]:
x

array([[93, 76, 31, 88],
       [23, 77, 42, 45],
       [95, 22, 91, 30],
       [68, 95, 42, 59],
       [92, 90, 80, 62],
       [45, 99, 33, 24],
       [56, 77, 16, 97]])

In [19]:
#MAD for axis=None
mad = compute_mad(x)
print('\nMean Absolute Deviation:',mad)

Mean
[[62.42857143]]

Absolute difference
[[30.57142857 13.57142857 31.42857143 25.57142857]
 [39.42857143 14.57142857 20.42857143 17.42857143]
 [32.57142857 40.42857143 28.57142857 32.42857143]
 [ 5.57142857 32.57142857 20.42857143  3.42857143]
 [29.57142857 27.57142857 17.57142857  0.42857143]
 [17.42857143 36.57142857 29.42857143 38.42857143]
 [ 6.42857143 14.57142857 46.42857143 34.57142857]]

Mean Absolute Deviation: 24.571428571428573
