# Measures of Central Tendency

We use measures of central tendency to understand and calculate the center of a dataset. The most commonly used measures of central tendency are mean, median, and mode.

---

## 1. Mean (Arithmetic Mean)
**Definition:** The most common measure of center. Often referred to as the average.

$
\text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n}
$

- **$\sum$ (Sigma)**: Summation symbol, meaning "add up all values."
- **$x_i$**: Represents each data point in the dataset.
- **$n$**: Total number of data points.

**Steps to Calculate**:
1. Add up all the values in the dataset.
2. Divide the sum by the number of data points.

**Example:**
- Dataset: $[1, 2, 4, 6, 7]$
- Sum: $1 + 2 + 4 + 6 + 7 = 20$
- Mean: $20 \div 5 = 4$

**Notation:**
- **Population Mean ($\mu$)**: Used when the dataset represents the entire population.
- **Sample Mean ($\bar{x}$)**: Used when the dataset is a sample of the population.

---

## 2. Weighted Mean
**Use Case:** When data points have different weights or importance.

$
\text{Weighted Mean} = \frac{\sum_{i=1}^{n} (w_i \cdot x_i)}{\sum_{i=1}^{n} w_i}
$
- **$w_i$**: Weight of each data point.
- **$x_i$**: Value of each data point.

**Example:**
- Dataset: Department satisfaction scores with varying department sizes.
- Calculation: Multiply each score by its department size, sum the products, and divide by the total number of employees.

---

## 3. Truncated Mean
**Use Case:** When outliers are present in the dataset.

**Steps:**
1. Remove a percentage of the highest and lowest values.
2. Calculate the mean of the remaining data.

**Example:**
- Dataset: $[16, 18, 21, 27, 32, 33, 91]$
- Remove the highest (91) and lowest (16) values.
- Calculate the mean of the remaining values: $[18, 21, 27, 32, 33]$.

**Caution:** Always communicate when data has been truncated to avoid misinterpretation.

---

## 4. Mode
**Definition:** The value that occurs most frequently in a dataset.

**Types:**
- **Unimodal:** One mode.
- **Bimodal:** Two modes. Indicates two distinct peaks in the data distribution.
- **No Mode:** All values occur with the same frequency.

**Example:**
- Dataset: $[16, 18, 21, 27, 32, 32, 33]$
- Mode: $32$ (occurs twice).

---

## 5. Median
- **Definition**: The middle value of an ordered dataset.

**Steps to Calculate:**
1. Arrange the data in ascending or descending order.
2. Identify the middle value:
    - If the dataset has an odd number of values, the median is the middle value.
    - If the dataset has an even number of values, the median is the average of the two middle values.

**Example:**
- Dataset: $[16, 18, 21, 27, 32, 33]$
- Median: $(21 + 27) \div 2 = 24$

---

## Choosing the Right Measure of Central Tendency
**Factors to Consider:**

1. **Type of Data:**
    - **Numeric vs. Non-Numeric**: Mean and median require numeric data; mode can be used for non-numeric data.
    - **Continuous vs. Discrete**: Mean is suitable for both, but median is often preferred for skewed data.
    - **Nominal vs. Ordinal**: Median and mode are appropriate for ordinal data; mode is the only option for nominal data.

2. **Presence of Outliers:**
    - **Mean**: Sensitive to outliers.
    - **Median**: Robust against outliers.
    - **Mode**: Unaffected by outliers.

3. **Context of the Data:**
    - **Example**: Median is often used for income data to avoid skewing by extreme values.

---


## Practical Applications**
**Mean:**
- Used for symmetric datasets without outliers.
- Example: Average height of a group of people.

**Weighted Mean**
- Useful when data points have different levels of importance.

**Truncated Mean**
- Useful for datasets with extreme outliers.

**Median:**
- Used for skewed datasets or datasets with outliers.
- Example: Median household income.

**Mode:**
- Used for categorical data or identifying the most common value.
- Example: Most common pet in a household.