# Cheybyshev's inequality 

Let $X$ be a random varible with mean $\mu$ and variance $\sigma^2$. The Cheybyshev's inequality states that
$$
P(\left|X-\mu\right| > k \sigma ) \leq \frac{1}{k^2}
$$

For example, we can say for sure that the probability that a randomly chosen value from an arbitrary population, to be less than or bigger than $2\sigma$ from the mean $\mu$ is 25%.

In this lecture, we are going to verify the Cheybyshev's inequality numerically.

In [112]:
# import packages
import numpy as np
import pandas as pd

In [113]:
def check_inside_interval(x=0.0, mean=0.0, sigma=1.0, k=1.0):
    '''
    A funtion that checks that wheter "mean - k*sigma < x < mean + k*sigma is" true or false
    
    '''
    if (abs(x - mean) < k*sigma):
        return True
    else:
        return False

In [114]:
def check_Cheybyshev(data, mean=0.0, sigma=1.0, k=1.0):
    '''
    input: data, mean of the population of , std of the population, k= arbitary number (int usually)
    output: checks that the sample of populaion respects Cheybyshev's inequality
    
    '''
    total = data.shape[0]
    check=[]
    for item in data:
        check.append(check_inside_interval(x=item, mean=mean, sigma=sigma, k=k))
    check = np.asarray(check)
    events_inside_the_interval = (1.0*check.sum())/total
    events_outside_the_interval = 1.0-events_inside_the_interval
    print("fraction of events in the sample outside of "+str(k)+"*sigma :")
    print("                    from simulation: "+str(events_outside_the_interval))
    #return events_outside_the_interval
    print("fraction of events in the sample outside of "+str(k)+"*sigma that we expect")
    print("                    from Cheybyshev's inequality is at most: "+str(1.0/k**2))
    if (events_outside_the_interval < str(1.0/k**2)):
        print("The Chebyshev's inequality is verified!")
    else:
        print("The Chebyshev's inequality is NOT verified!")
    print("================================================")
        

In [115]:
# We check for normal distirbution
mean=1
sigma=10
data=np.random.normal(loc=mean, scale = sigma,size=20000)
check_Cheybyshev(data, mean=mean, sigma=sigma, k=1)
check_Cheybyshev(data, mean=mean, sigma=sigma, k=2)
check_Cheybyshev(data, mean=mean, sigma=sigma, k=3)
check_Cheybyshev(data, mean=mean, sigma=sigma, k=4)
check_Cheybyshev(data, mean=mean, sigma=sigma, k=10)

fraction of events in the sample outside of 1*sigma :
                    from simulation: 0.31245
fraction of events in the sample outside of 1*sigma that we expect
                    from Cheybyshev's inequality is at most: 1.0
The Chebyshev's inequality is verified!
fraction of events in the sample outside of 2*sigma :
                    from simulation: 0.04435
fraction of events in the sample outside of 2*sigma that we expect
                    from Cheybyshev's inequality is at most: 0.25
The Chebyshev's inequality is verified!
fraction of events in the sample outside of 3*sigma :
                    from simulation: 0.002
fraction of events in the sample outside of 3*sigma that we expect
                    from Cheybyshev's inequality is at most: 0.111111111111
The Chebyshev's inequality is verified!
fraction of events in the sample outside of 4*sigma :
                    from simulation: 0.0
fraction of events in the sample outside of 4*sigma that we expect
               

## Mathematical proof of Cheybyshev's inequality

$$
P(\left|X-\mu\right| > k \sigma ) = \int_{\left|X-\mu\right| > k \sigma } f(x) dx = 
$$
$$
\quad\quad\quad\quad\quad\quad\quad\quad\quad \leq \int_{\left|X-\mu\right| > k \sigma } \frac{\left|X-\mu\right|^2 }{k^2 \sigma^2} f(x) dx 
$$
$$
\quad\quad\quad\quad\quad\quad\quad\quad \leq \int \frac{\left|X-\mu\right|^2 }{k^2 \sigma^2} f(x) dx 
$$
$$
\quad\quad\quad\quad\quad\quad\quad\quad\quad = \frac{1}{k^2 \sigma^2} \int (X-\mu)^2 f(x) dx 
$$
$$
\quad\quad\quad\quad\quad\quad\quad\quad = \frac{1}{k^2 \sigma^2} Var[X]  \quad \quad
$$
$$
\quad\quad\quad\quad\quad\quad\quad\quad\quad = \frac{1}{k^2} \quad \quad \quad \quad \quad
$$