### Outlier Detection.

What is an Outlier?

An outlier is a data point that is very different from the majority of the data.
It “stands out” because it’s unusually high or low compared to the rest.

Examples:

In a class of students, most scores are between 50–80, but one student scores 0 or 100 → those are outliers.

Why Detect/Remove Outliers?

They can distort statistical measures

Mean (average) becomes misleading.

Standard deviation becomes inflated.

Z = X - u/6 
6 is sigma
u is mean 
6 is std deviation

### Topic 2: IQR (Interquartile Range) Method.

QR Method Basics

Q1 = 25th percentile (value below which 25% of data lies)

Q3 = 75th percentile (value below which 75% of data lies)

IQR = Q3 – Q1

📌 Rule for Outliers:

Any value < Q1 – 1.5 × IQR → Lower Outlier

Any value > Q3 + 1.5 × IQR → Upper Outlier

In [19]:
import pandas as pd

# Step 1: Dataset with an outlier
scores = [70, 72, 68, 74, 71, 73, 69, 75, 72, 100]
df = pd.DataFrame(scores, columns=['score'])

# Step 2: Calculate Q1, Q3, and IQR
Q1 = df['score'].quantile(0.25)
Q3 = df['score'].quantile(0.75)
IQR = Q3 - Q1

# Step 3: Define bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Step 4: Detect outliers
df['outlier_iqr'] = df['score'].apply(lambda x: x < lower_bound or x > upper_bound)

print("📌 Dataset WITH Outlier Flag:")
print(df)

# Step 5: Remove outliers
df_no_outliers = df[~df['outlier_iqr']]

print("\n📌 Dataset AFTER Removing Outliers:")
print(df_no_outliers)




📌 Dataset WITH Outlier Flag:
   score  outlier_iqr
0     70        False
1     72        False
2     68        False
3     74        False
4     71        False
5     73        False
6     69        False
7     75        False
8     72        False
9    100         True
