
# Descriptive Statistics: Mean, Median, Mode, Standard Deviation, and Variance

## Overview
Descriptive statistics provide a summary of data through measures of central tendency and dispersion. These techniques are foundational in statistics, used to understand and interpret data distributions.

In this notebook, we will cover:

- **Mean**: The average value of a dataset.
- **Median**: The middle value, which separates the data into two halves.
- **Mode**: The most frequently occurring value(s) in the dataset.
- **Standard Deviation**: A measure of the spread of data from the mean.
- **Variance**: The square of the standard deviation, representing data dispersion.


## 1. Mean, Median, and Mode

These are measures of central tendency, which describe where most data points in a dataset tend to cluster.

- **Mean** is calculated by summing all values and dividing by the number of values.
- **Median** is the middle value of a sorted dataset (or the average of two middle values if the dataset has an even number of elements).
- **Mode** is the most frequently occurring value in the dataset.


In [6]:

import numpy as np
from scipy import stats

# Sample dataset
data = [5, 7, 3, 7, 9, 10, 6, 5, 8, 7, 5]

# Mean
mean = np.mean(data)
print("Mean:", mean)

# Median
median = np.median(data)
print("Median:", median)

# Mode
mode = stats.mode(data)
print("Mode:", mode.mode)


Mean: 6.545454545454546
Median: 7.0
Mode: 5



## 2. Standard Deviation and Variance

These are measures of dispersion, describing how spread out the values in a dataset are.

- **Standard Deviation** (\( \sigma \)) is the square root of the average squared deviation from the mean.
- **Variance** (\( \sigma^2 \)) is the average of the squared deviations from the mean.

Let's calculate the standard deviation and variance for our sample dataset.


In [7]:
# Standard Deviation
std_dev = np.std(data, ddof=1)  
print("Standard Deviation:", std_dev)

# Variance
variance = np.var(data, ddof=1)  
print("Variance:", variance)


Standard Deviation: 2.018099916438052
Variance: 4.072727272727273



## Summary

In this notebook, we covered:

- **Mean**: The average of the dataset.
- **Median**: The middle value in a sorted dataset.
- **Mode**: The most frequently occurring value in the dataset.
- **Standard Deviation**: A measure of the spread of data from the mean.
- **Variance**: The square of the standard deviation, indicating data dispersion.