# Descriptive Statistics Notes

Descriptive statistics refers to a branch of statistics that involves the analysis, summary, and presentation of data in a meaningful and concise manner. Its main purpose is to describe and summarize the main features, patterns, and characteristics of a dataset. Descriptive statistics focuses on providing quantitative measures and visual representations that help in understanding the data, identifying trends, and making data-driven decisions.



## Key aspects of descriptive statistics

- Central Tendency Measures: Descriptive statistics often involves calculating measures that represent the center or average of a dataset, such as the mean, median, and mode. These measures provide a sense of the typical or central value around which the data points tend to cluster.

- Measures of Dispersion: Descriptive statistics also includes measures that quantify the spread, variability, or dispersion of the data. Common measures of dispersion include the range, variance, and standard deviation. These measures provide insights into how the data points are distributed and how much they deviate from the central tendency.

- Data Visualization: Descriptive statistics involves presenting data visually using graphs, charts, and plots. Visual representations like histograms, box plots, scatter plots, and bar charts help in understanding the distribution, patterns, and relationships within the data.

- Summarizing Categorical Data: Descriptive statistics is not limited to numerical data. It also encompasses techniques for summarizing and describing categorical data using frequency tables, proportions, and percentages. These techniques help in understanding the composition and characteristics of categorical variables.

- Interpreting and Communicating Results: Descriptive statistics is focused on providing clear and concise summaries of data that can be easily interpreted and communicated to others. It aims to convey the main features and insights of the data in a way that is accessible to a broad audience.

## Descriptive Statistics --Applied Using Python

### Central Tendency Measures

In [2]:
import numpy as np

#### Mean

In [6]:
data = [1, 2, 3, 4, 5,6,7]
mean = np.mean(data)
print('The mean is:', mean)

The mean is: 4.0


#### Median

In [7]:
median = np.median(data)
print('The median is:', median)

The median is: 4.0


#### Mode

In [9]:
from scipy import stats

data = [1, 2, 2, 3, 4]
mode = stats.mode(data, keepdims=True)
print("Mode:", mode.mode[0])

Mode: 2


### Measures of Spread

Measures of spread, also known as measures of dispersion, provide information about how the data points are spread out or dispersed around a central value. Here's a summary of the key measures of spread and corresponding Python code examples:



#### Range

In [15]:
X = np.arange(100, 501).tolist()
print(X)

[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,

In [17]:
#calculate the range (MAX - MIN)
range = max(X) - min(X)
print(range)

400


#### Variance

Variance measures the average squared deviation of each data point from the mean. It provides an understanding of how spread out the data points are from the mean.

In [19]:
variance = np.var(X)
print(variance)

13400.0


#### Standard Deviation

The square root of the variance. It provides a more interpretable measure of the spread. It indicates the average distance between each data point and the mean

In [21]:
std_dev = np.std(X)
print(std_dev)

115.75836902790225


#### Interquartile Range

The IQR represents the range between the first quartile (25th percentile) and the third quartile (75th percentile). It is useful for identifying the spread of the middle 50% of the data.

In [22]:
q1=np.percentile(X, 25)
q3=np.percentile(X, 75)
iqr = q3-q1
print(iqr)

200.0


#### Mean Absolute Deviation (MAD)

MAD measures the average absolute deviation of each data point from the mean. It provides a robust measure of spread that is less sensitive to extreme values.

In [25]:
mad = np.median(abs(X-np.median(X)))
print('Mean Absolute Deviation:', mad)

Mean Absolute Deviation: 100.0
