# Statistical Measures in Data Science - Beginner Lecture

## Lessons Today
1. Introduction to NumPy
2. Measures of Central Tendency
3. Measures of Dispersion
4. Covariance and Correlation

In [24]:
import numpy as np
from scipy import stats


## 1. Introduction to NumPy

In [25]:
# Create a list and convert to NumPy array
data = [10, 20, 30, 40, 50]
array = np.array(data)
print(array)

[10 20 30 40 50]


## 2. Measures of Central Tendency

In [26]:
data = [2, 4, 4, 4, 5, 7, 9]

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)

mode_value = mode.mode

print("Mean:", mean)
print("Median:", median)
print("Mode:", mode_value)

Mean: 5.0
Median: 4.0
Mode: 4


### Formula for Mean:
Mean = (x1 + x2 + ... + xn) / n

In [27]:
sum(data) / len(data)

5.0

## 3. Measures of Dispersion

In [28]:
data = [10, 12, 23, 23, 16, 23, 21, 16]

range_val = np.max(data) - np.min(data)
variance = np.var(data)
std_dev = np.std(data)

print("Range:", range_val)
print("Variance:", variance)
print("Standard Deviation:", std_dev)

Range: 13
Variance: 24.0
Standard Deviation: 4.898979485566356


### Formula for Variance:
Variance = Σ(x - μ)^2 / n

In [29]:
mean = sum(data) / len(data)
variance = sum((x - mean)**2 for x in data) / len(data)
variance

24.0

## 4. Covariance and Correlation

In [30]:
x = [2, 4, 6, 8]
y = [1, 3, 2, 5]

cov_matrix = np.cov(x, y, bias=True)
cov_xy = cov_matrix[0][1]

corr_matrix = np.corrcoef(x, y)
corr_xy = corr_matrix[0][1]

print("Covariance:", cov_xy)
print("Correlation:", corr_xy)

Covariance: 2.75
Correlation: 0.8315218406202999


### Correlation Formula:
Correlation = Cov(x, y) / (std_x * std_y)

In [31]:
corr = cov_xy / (np.std(x) * np.std(y))
corr

np.float64(0.8315218406202998)

##  Mini Exercises
1. Create a list of 7 numbers. Calculate mean, median, and mode.
2. Find the range, variance, and standard deviation of a sample list.
3. Calculate covariance and correlation between two simple lists.
4. Write a Python function to compute the mean manually using a loop.

##  Conclusion and Q&A
Statistical measures help us summarize, understand, and **trust** our data.
They’re the building blocks for all data analysis work in Data Science.

**Any questions? Let’s explore together!**