<a href="https://colab.research.google.com/github/ivanozono/DescriptiveStatistics/blob/main/(2)Bill_Gates_Effect_on_a_Bar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to descriptive statistics.

Descriptive statistics is a crucial part of any data analysis. It helps us understand the distribution of the data, identify patterns and trends, and summarize the data in a way that is easy to understand. The two key concepts we will discuss in this notebook are the mean and the median.

---

## Average and its interpretation

The average, also known as the mean, is a statistical measure used to understand the center of a data distribution. It is calculated by adding all the numbers in the data set and then dividing by the number of numbers in the data set.
Although the average is a useful tool for summarizing a data set, it can be misleading if the data set contains outliers. This is because the average is sensitive to outliers, which means it can be strongly influenced by values that are significantly higher or lower than the rest.

## Median and its interpretation

The median is another statistical measure that is used to understand the center of a data distribution. Unlike the average, the median is the middle value in a set of data when the numbers are ordered in ascending or descending order.

The median is a resistant measure, which means that it is not sensitive to outlier values. Therefore, in a data set with outlier values, the median may be a better representation of the center of the data distribution.

## Comparison between mean and median

Both the mean and the median are measures of central tendency used to understand the center of a distribution of data. However, they have different properties:

- **Mean:** It is useful when the data is uniformly distributed and there are no extreme values. However, it can be misleading if there are extreme values in the data.
- **Median:** It is useful when the data has extreme values that could skew the mean. The median is resistant to extreme values and can provide a better representation of the center of the data distribution in such cases.

In practice, it is useful to calculate both the mean and the median and compare them to understand the distribution of the data.

## Example: Bill Gates' Effect on a Bar

To illustrate how the mean can be misleading and how the median can be a better representation in certain cases, consider the following example:

Imagine that Bill Gates, one of the richest men in the world, walks into a bar. The average income of the people in the bar would increase dramatically. However, this does not mean that everyone in the bar suddenly became richer. This is an example of how the mean can be misleading.

On the other hand, if we calculate the median income, we will see that it is not affected by Bill Gates' entrance. The median would remain the same, providing a better representation of the incomes of most people in the bar.

Next, we will simulate this scenario using Python code.

---

Influence of Outliers on Average and Median

---

In [1]:
# Import necessary libraries
import numpy as np

# Let's assume there are 10 people in the bar and each of them earns $100
incomes = np.repeat(100, 10)

# Calculate the average income
average_income = np.mean(incomes)
print(f'Average income before Bill Gates enters the bar: {average_income}')

# Calculate the median income
median_income = np.median(incomes)
print(f'Median income before Bill Gates enters the bar: {median_income}')

# Now, let's assume Bill Gates enters the bar. His net worth is around $100 billion
incomes = np.append(incomes, 100_000_000_000)

# Calculate the new average income
average_income = np.mean(incomes)
print(f'Average income after Bill Gates enters the bar: {average_income}')

# Calculate the new median income
median_income = np.median(incomes)
print(f'Median income after Bill Gates enters the bar: {median_income}')

Average income before Bill Gates enters the bar: 100.0
Median income before Bill Gates enters the bar: 100.0
Average income after Bill Gates enters the bar: 9090909181.818182
Median income after Bill Gates enters the bar: 100.0




 The aim of this code segment is to demonstrate the effect of outliers (extreme values) on the calculation of average (mean) and median values.

---

**Code Explanation:**

1. **Initial Setup:**
  
    We begin by simulating the incomes of 10 people in a bar, each earning $100.

2. **Calculating Average and Median Before Outlier:**
   
    We compute and display the average and median incomes for the initial group.

3. **Introducing an Outlier:**
   
    We simulate Bill Gates entering the bar with a net worth of $100 billion, which acts as a significant outlier compared to the previous incomes.

4. **Calculating Average and Median After Outlier:**
  
    Again, we compute and display the average and median incomes, but this time including the outlier.



The key takeaway from this simulation is that while the outlier significantly affects the average (mean) value, it barely has any effect on the median. This demonstrates that median is more robust to extreme values compared to mean.



Such simulations provide valuable insights, especially in statistics and data science, by showcasing the susceptibility of certain statistical measures (like mean) to outliers and the relative robustness of others (like median). In real-world scenarios where data might have extreme values, understanding these differences helps in choosing the right measure of central tendency.

As we can see from the results, the average income increased dramatically from 100 to 9,090,909,181.82 when Bill Gates entered the bar. However, the median income remained at 100, indicating that the majority of individuals had low income despite the presence of a high earner. This illustrates how the average can be misleading in the presence of extreme values and how the median can be a better representation in such cases.