## Introduction:
In the world of Data Science or Data Analytics the following type of questions are quite important and common:
* What percentage of students in a given school have heights within a given interval? (say, [135cm,200cm])
* What percentage of people have their salaries within a given range? 
* What percentage of children have weights in a given interval? and so on…the list is endless.

Now, these questions become extremely easy to answer if we know that the random variable under consideration has Normal Distribution with a given mean and standard deviation. We can then easily apply the 68–95–99.7 rule to get the answers to the above questions.

The rule simply states that 68% of data points lie within 1 std-dev around the mean([mean-1 * std-dev, mean+1 * std-dev]), 95% of the data points lie within 2 stnd-dev around the mean([mean-2 * std-dev, mean+2 * std-dev]) and 99.7% of the data points lie within 2 stnd-dev around the mean([mean-3 * std-dev, mean+3 * std-dev]).

The problem arises **when we don't know the underlying distribution of the random variable that we are dealing with and this is when Chebyshev’s Inequality comes to our rescue**!

It requires two conditions to be met with:
* the mean of the concerned random variable should be finite and
* its standard deviation must be finite and non-zero.

Chebyshev’s Inequality does not give us the exact percentage of data lying with a particular range, but rather gives an approximation or a minimum value of the same.

Chebyshev's inequality is a fundamental result in probability theory and statistics that gives an upper bound on the probability that the value of a random variable deviates from its mean. The inequality is useful because **it applies to any random variable, regardless of the underlying distribution**, as long as the mean and variance are defined.

### Statement of Chebyshev's Inequality:

Let X be a random variable with mean μ and variance σ². For any k>0,

P(∣X−μ∣ ≥ kσ) ≤ 1/k²

In words, this inequality states that the probability that the random variable X deviates from its mean μ by at least k standard deviations σ is at most 1/k².

Implications:
* For k=2, the inequality tells us that at least 75% of the values lie within two standard deviations of the mean.
* For k=3, at least 89% of the values lie within three standard deviations of the mean.

This result is particularly useful because it provides a way to understand the spread and concentration of values for any distribution, not just those that are normal (Gaussian). It's a tool for estimating probabilities and making inferences about data distributions even when specific details about the distribution are not known.

Let's understand this with help of an example:

**Suppose we have a set of exam scores for a class of students. The scores are represented by a random variable X with a mean (μ) of 70 and a standard deviation (σ) of 10. We want to determine the probability that a student's score deviates from the mean by at least 20 points. We can use Chebyshev's inequality to estimate this probability.**

Solution:

* **Calculate k**: The deviation we are interested in is 20 points. Since the standard deviation is 10, we can express this deviation in terms of the number of standard deviations (k):
      k=20/10=2

* Apply Chebyshev's Inequality:
      Chebyshev's inequality states:
      P(∣X−μ∣≥kσ)≤1/k²

      Substituting k=2:
      P(∣X−70∣≥20)≤1/2²=1/4=0.25

* Interpretation:

    According to Chebyshev's inequality, the probability that a student's score deviates from the mean score of 70 by at least 20 points is at most 0.25 (or 25%).

Verification with Data:

Let's say we have the following 10 exam scores: 50, 55, 60, 65, 70, 75, 80, 85, 90, 95.

    Mean (μ) = 70
    Standard deviation (σ) ≈ 14.14 (calculated from the data)

To find the actual proportion of scores that deviate by at least 20 points from the mean:

    Scores less than 50 (70 - 20) or greater than 90 (70 + 20): 50, 55, 90, 95

    There are 4 such scores out of 10:

The proportion of scores that deviate by at least 20 points from the mean is:

* There are 4 scores out of 10 that deviate by at least 20 points.
* This means 40% of the scores (4 out of 10) deviate by at least 20 points from the mean.

Conclusion:

According to Chebyshev's inequality, no more than 25% of the scores should deviate by at least 20 points from the mean. However, in this specific example, 40% of the scores do deviate by at least 20 points. Chebyshev's inequality provides an upper bound on the probability, not the exact value, so the actual proportion can vary, but it will not exceed the bound given by the inequality.

This example shows that Chebyshev's inequality is conservative and useful for providing bounds when the exact distribution is unknown.